Releasing RecGen: a collaboration between @ToyotaResearch, @toyota_europe, and @UvA_Amsterdam tackling a core 3D vision challenge: reconstructing complete multi-object scenes (parts, poses, textures, even occluded geometry) from just 1 to a few RGB-D views.
Trained purely on synthetic data, RecGen achieves SOTA on real-world robotics and 6D pose benchmarks, handling occlusions, symmetry, and complex interactions.
A step toward scalable, high-fidelity digital twins for robotics, and better evaluation and training of generalist policies.
reconstruction-by-generation.github.io
I’d previously thought that single-view reconstruction would be tough with only synthetic data, but it turns out it’s not! Check out this very cool work applying procedural 3D data to *full* reconstruction.
Releasing RecGen: a collaboration between @ToyotaResearch, @toyota_europe, and @UvA_Amsterdam tackling a core 3D vision challenge: reconstructing complete multi-object scenes (parts, poses, textures, even occluded geometry) from just 1 to a few RGB-D views.
Trained purely on
@holoday_ The baselines we use are wider than that (>4 cm), but you can always change the code to generate your own. You should definitely check out @_ilya_c's very great work on this (though they consider the unsupervised setting).
arxiv.org/abs/2212.12324
Stereo depth is important in robotics, and relies heavily on synthetic data. But what actually makes for good synthetic data?
In WMGStereo, we study dataset design and discover a powerful data recipe - just 500 samples of our data can match 40k Sceneflow samples! 🧵[1/7]
By collecting the best design choices from our study, we create a full-scale dataset, WMGStereo-150k. Our data is super sample efficient and scales well! [6/7]
It's time to systematically teach VLMs to see with synthetic images!
We built VisionFoundry, a simple but intuitive framework that generates synthetic image datasets from only a task name.
10k synthetic data → over +10% improvement on visual perception benchmarks 👀
Video models surprisingly can solve mazes, but inconsistently. We understand little about how they reason, making it hard to use such abilities.
We investigate the denoising process and find models commit to a plan early, letting us screen far more candidates for better perf.
🧵
ML interview question: You’re training a 72B MoE MNIST classifier. Layer 53 MLP expert 7 destabilizes when the ones in the dataset are turned upside down. What happened?
Stereo depth is highly useful for robots. Meet WAFT-Stereo: #1 on ETH3D (BP-0.5), Middlebury (RMSE), and KITTI (all metrics); 61% less zero-shot ETH3D BP-0.5 error; 1.8-6.7x faster than prior SOTA. Key idea: classify disparity into bins, then iterative high-res warping.🧵1/2
We made Muon run up to 2x faster for free!
Introducing Gram Newton-Schulz: a mathematically equivalent but computationally faster Newton-Schulz algorithm for polar decomposition.
Gram Newton-Schulz rewrites Newton-Schulz such that instead of iterating on the expensive rectangular X matrix, we iterate on the small, square, symmetric XX^T Gram matrix to reduce FLOPs. This allows us to make more use of fast symmetric GEMM kernels on Hopper and Blackwell, halving the FLOPs of each of those GEMMs.
Gram Newton-Schulz is a drop-in replacement of Newton-Schulz for your Muon use case: we see validation perplexity preserved within 0.01, and share our (long!) journey stabilizing this algorithm and ensuring that training quality is preserved above all else.
This was a super fun project with @noahamsel, @berlinchen, and @tri_dao that spanned theory, numerical analysis, and ML systems! Blog and codebase linked below 🧵
5K Followers 7K FollowingBrains are chemistry-based computers. All information is physical. Reality is the truth. Science measures it. Sustainable love is Humanism. #FBR #EPluribusUnum
8K Followers 590 FollowingCEO, founder, and building a billion robots @bracketbot
interned @ UberATG, GoogleX, Samsung Research, Tesla Optimus
school @ Waterloo Mechatronics
2K Followers 972 FollowingFinal-year PhD at USC. I love simple designs that scale and generalize. Intelligence is knowledgeable & functional context. Interned at @GoogleDeepMind, @NVIDIA
2K Followers 2K FollowingResearch Scientist @ToyotaResearch | PhD in AI and DL @GeorgiaTech | Researching Large Behavioral Models | 3D Vision | Robotics
5K Followers 643 FollowingAssistant Prof of CS, @EPFL_en Swiss Federal Institute of Technology Lausanne. Previously @Berkeley_AI, @StanfordAILab, @ucf. Into Vision, MachineLearning, AI
33K Followers 18K Followingpfp by @ShizzyAizawa. Post-Newtonian heterodox economist. Inventor of the JvN hypothesis. World champion of econ/food ragebait. Hermit savant. Omakase critic
66 Followers 455 Following3d vision research (big + small models) @realitylabs. @brownvisualcomp alum.
tweet = my personal opinions.
creativity, learning, perception, and action ≥≥≥
29K Followers 123 FollowingI build sane open-source RL tools. MIT PhD, creator of Neural MMO and founder of PufferAI. DM for business: non-LLM sim engineering, RL R&D, infra & support.