5/5 Results on ImageNet-512: competitive FID of 1.4 with high reconstruction quality (PSNR: 25.7). On Kinetics-600 video generation: we set a new state-of-the-art FVD of 1.3. Even our small model hits 1.7 FVD. Finally, we scale to text-to-image with strong perceptual quality.
4/6 This gives you a simple knob to control the reconstruction vs. modeling trade-off. Higher bitrate = better reconstruction but harder to model. Lower bitrate = easier to model but you lose fine details.
1/6 Introducing Unified Latents: what if your diffusion model's latents were measured in bits? Instead of relying on dimensionality reduction, we learn a latent AE with explicit bitrate control.
Paper: arxiv.org/abs/2602.17270@emiel_hoogeboom, @TimSalimans
We have a new distillation method that actually *improves* upon its teacher.
Moment Matching distillation (arxiv.org/abs/2406.04103) creates fast stochastic samplers by matching data expectations between teacher and student.
Work with @emiel_hoogeboom@JonathanHeek @tejmensin.
1/4
Fast sampling with 'Multistep Consistency Models': We get 1.6 FID on Imagenet64 in 4 steps and scale text-to-image models, generating 256x256 images with 16 steps.
Guess which row is distilled?
With @emiel_hoogeboom@TimSalimans
Arxiv: arxiv.org/abs/2403.06807
If diffusion models are so great, why do they require modifications to work well? Like latent diffusion and superres diffusion?
Introducing "simple diffusion": a single straightforward diffusion model for high res images (arxiv.org/abs/2301.11093) . w/ @JonathanHeek@TimSalimans
🥳 It is now super easy to fine-tune EfficientNet in FLAX! We open sourced a FLAX version of all officials EfficientNet checkpoints as a by product of our last paper: github.com/google-researc…
JAX on Cloud TPUs is getting a big upgrade!
Come to our NeurIPS demo Tue. Dec. 8 at 11AM PT/19 GMT to see it in action, plus catch a sneak peek of a new Flax-based library for language research on TPU pods.
Link: neurips.cc/ExpoConference… (neurips.cc/Register2 is still open!)
I’d like to share the new JAX/Flax PixelCNN++ (using new Flax ‘linen’ API github.com/google/flax/tr…), a performant baseline AR image model, built as part of my internship at Google Brain Amsterdam. github.com/google/flax/tr…. 👇
@LazyOp @NalKalchbrenner Thanks for spotting that. You are correct, those terms are missing from the pseudo-code. I will make sure that this gets fixed in the revision.
@duane_rocks@avitaloliver@DeepSpiker Actually it's both. There's uncertainty in the model outputs and uncertainty about the model parameters. Sampling is used to marginalize over the uncertainty in the model parameters to obtain predictive uncertainty.
@goodfellow_ian @NalKalchbrenner There's definitely reason to believe that a "Bayesian discriminator" will result in a better behaved estimate of D*. The predictions will be less saturated potentially resulting in a better signal for the generator. An ensemble of discriminators could improve robustness further.
Announcing exciting progress in Bayesian deep learning: the new ATMC sampler achieves first of its kind Bayesian inference results on ImageNet
Check out the results and the paper 👇
Heek et al: arxiv.org/abs/1908.03491
1K Followers 4K Followinginvesting @hiFramework. former rates/commodities vol trader @DRWTrading. Tweets are my views and not personal advice. May have positions in assets discussed.
539 Followers 559 FollowingMember of Technical Staff at Black Forest Labs. Studied music composition at Paris Conservatoire. Likes neural networks and music.
11K Followers 9K Following∿ Music hackers. Algoraves. Inventing, playing. Neural synthesis. 24/7 ai death metal. Stable Audio team. Open models. Mischief @harmonai_org @artblocks
36K Followers 269 FollowingAngel investor. Ex Researcher in Deep Learning @GoogleDeepMind. Co-creator @GoogleAI Brain Amsterdam, Ex original @DeepMind, Edu at Oxford, UvA and Stanford.