Jacob Helwig @JacobHelwig
Joined February 2022-
Tweets59
-
Followers323
-
Following1K
-
Likes346
What if an autoregressive LM (ARLM) could teach itself to become a diffusion LM (DLM)? Meet OPDLM. We convert ARLMs into masked diffusion LMs via On-Policy Self-Distillation. No separate DLM teacher, no pretraining from scratch, just post-training an ARLM.
@yoavgo @TacoCohen Then sample a bunch of latents and marginalize over them via the Bayes estimator for a given risk. LLM evals use 0-1 risk, in which case the Bayes estimator is the posterior mode. This is exactly the self-consistency=majority@N strategy
Excited to share that Learnability-Informed Fine-Tuning of Diffusion Language Models (LIFT) has been accepted at ICML 2026! 🎉 paper: arxiv.org/abs/2605.22939 code: github.com/divelab/LIFT
@hi_tysam So you reduce exposure bias by adding random noise to model inputs. I wonder what would happen if noise more closely followed inference-time noise, eg by replacing input tokens with tokens sampled from model by forwarding model once before each train step
LIFT is the SFT recipe for dLLMs that actually understands the masking dynamics. Vanilla SFT on dLLMs often HURTS performance, and they finally pin down why. Their analysis: vanilla SFT overlooks learnability. Rare tokens are difficult to learn when most of the input is masked because the model has nothing to ground them in. Common tokens are easy and of little value to learn when most of the input is unmasked because the answer is essentially already given. LIFT aligns training with the information available at different diffusion time steps. Learn easy tokens when most of the input is masked (build up basic vocabulary at the noisy end), and learn hard tokens when more context is available (let the model use that context). The schedule matches the difficulty of each token to the moment the model is best positioned to absorb it. Learnability-Informed Fine-Tuning of Diffusion Language Models Paper: arxiv.org/abs/2605.22939 Code: github.com/divelab/LIFT
@StefanGliga @kalomaze Although maybe not realistic, the MDLM indep assumption is explicit
@kalomaze This paper arxiv.org/abs/2604.11035 guarantees parallel joint distribution sampling using spec decode-style verification/rejection. The idea (fig 9) is: in the same forward pass, decode multiple tokens AND vfy tokens from last fwd. Diffusion is in the title, but closer to MTP IMO
@marikgoldstein torchCFM has good code: github.com/atong01/condit…
@gabriberton @LucaAmb Isn’t the main point of GQA and MLA to reduce KV cache size?
@novasarc01 @teortaxesTex Yeah, multi-teacher OPD is super cool and was also used by GLM-5 and MiMo-V2-Flash. Remarkably, some insanely-cracked dev already shipped MOPD in VeRL: github.com/verl-project/v…
@eigenron I think TTRL did it better (more thorough experiments) arxiv.org/abs/2504.16084
@giffmana @francoisfleuret If in "pick up a good direction", "good" == "*locally* good", then I don't think your experiments contradict (A)
@sasuke___420 Here's some good examples of torchrec: - BERT: github.com/meta-pytorch/t… - Decoder-only: github.com/meta-recsys/ge…
@sasuke___420 As mentioned by another commenter, torch has sparse Adam, but nccl doesn't support sparse collectives, so I think it will only work with gloo backend. torchrec/fbgemm have some nice sparse optimizers, although they probably won't work with deepspeed/FSDP/Megatron
@kalomaze @sasuke___420 Is it possible that people have reached the conclusion that WD on embeddings is bad due to the gradient sparsity issue? (ie, by applying WD on token embeddings that don't appear in the current batch)
@nofreewill42 @tendies @TheVixhal @Yuchenj_UW @karpathy I think he’s aware, since vixhal is the one karpathy said it to
(7/7) The two supersonic flow datasets we generated for evaluating ShockCast are available on HuggingFace: huggingface.co/divelab Paper: arxiv.org/pdf/2506.07969 Code: github.com/divelab/AIRS/t…
(6/n) We explore several physical priors to better align the neural CFL model with the classical CFL condition. We also introduce timestep conditioning strategies inspired by neural ODE and Mixture of Experts.
We recently developed ShockCast, a deep learning framework for modeling high-speed flows using adaptive time-stepping (1/n)
Vignesh Kothapalli @kvignesh1420
179 Followers 146 Following PhD Student @Stanford CS. Prev @NYU_Courant, @IITGuwahati
Stathis V. @techabilly
555 Followers 2K Following ml @stripe - but actually a 🤖 in disguise - ( )( )( )
AIEnthusiast26 @AIEnthusiast26x
82 Followers 543 Following The AI Field Guide. Frontier research summarized for builders — not press releases. arXiv + HuggingFace.
Kaak @Kaaksaeb
220 Followers 4K Following
Stefan @StefanGliga
149 Followers 903 Following AI compiler engineer and an aspiring AI researcher. I live and breathe AI papers and exotic hardware.
Emily Calvet @iamemilycalvet
68 Followers 510 Following Let us have a world of ordinary and original people
Ousman Ceesay @CeesayOusm52449
35 Followers 6K Following May there always be work for your hands to do. May your purse always hold a coin or two. May the sun always shine upon your windowpane….. 🪟 🙏👋😇
λux @novasarc01
22K Followers 3K Following tensor shepherd in a non-euclidean pasture | grazing on cuda cores
Michael Bronstein @mmbronstein
58K Followers 8K Following #DeepMind Professor of #AI @UniofOxford / Director #AITHYRA / Chief Scientist @proximabio / https://t.co/kZpGpDAw4t (opinions are mine) 🤖🧪🧬🎶🐎
Nathalie @Nathalie_Dik
188 Followers 7K Following Happy life is being able to recognize opportunities
Kristine Socall @kristinesocall
1K Followers 4K Following Igniting hope. Building community. MBA Economic Development. Founder - Socall C Group FinTech Founder - Gifted Dreamers 501(c)(3)
Adeline Whitmore @AdelineWhi73885
0 Followers 14 Following
bethany @Elena9857860688
5 Followers 731 Following
Mamta Saini @mamtapc003
1 Followers 111 Following
JHU CLSP @jhuclsp
8K Followers 7K Following Center for Language and Speech Processing at @JohnsHopkins #NLProc #MachineLearning #AI https://t.co/6IXR5OSQtw @[email protected]
Yaroslav Golubev @areyde
949 Followers 4K Following Research Administrator @JetBrains Research. Love writing papers and poetry, history, languages, and literally everything else. 以卮言為曼衍。🏆
VegetaAvatar @VeGeTaX29
20 Followers 7K Following
Yuecheng Li @Yuecheng_Lee
282 Followers 1K Following Researcher @ Kuaishou Technology; CS MSc @ SYSU Prev @alibaba_cloud @NetEaseGames_EN Work on #LLM (LLM4Rec, Reasoning) and #Trustworthy_AI (LLM Eval, Privacy)
Joe Stacey @_joestacey_
2K Followers 2K Following NLP postdoc at @SheffieldNLP Ex @Imperial_NLP PhD, @Apple AI/ML Scholar, @UCL MSc Model robustness and now uncertainty quantification
Dileep Kalathil @DileepKalathil
150 Followers 591 Following Associate Professor, Texas A&M University (TAMU). Areas of interest: Reinforcement Learning, Machine Learning
Irqiqars @Irqiqars966244
36 Followers 732 Following
Andreas Kirsch 🇺�... @BlackHC
16K Followers 6K Following My opinions only here. 👨🔬 RS DeepMind, Midjourney 1y 🧑🎓 DPhil AIMS 4.5y 🧙♂️ RE DeepMind 1y 📺 SWE Google 3y 🎓 TUM 👤 @nwspk
Daniel Dobriy (Q11227... @semantic42
2K Followers 2K Following Knowledge Graphs Researcher @wuvienna & Programme Manager @KMA_HQ | #digitalhumanism #knowledgegraphs #ai | https://t.co/xoERSGOyvN
Rafsanjani @RafsanjaniHub
349 Followers 3K Following 📎 Learning, researching, and designing “proteins” using backpropagation and structural biology. 📎 Former student researcher at Drexel.
Anand @Anand44719958
29 Followers 5K Following
Yixin Wang @yixinwang_
682 Followers 5K Following
Subramanyam Sahoo @iamwsubramanyam
320 Followers 5K Following AGI Safety researcher (Independent), MARS Fellow(Cambridge AI Safety Hub), BASIS Fellow @UCBerkeley, Get Published or Die Trying.
T1nt1n @t1nt1nsn0wy
681 Followers 5K Following Noobie H4CK3R and researcher at @qualys. Prev @pwc. Views are my own :)
Shivam Mishra @ShivamM56397308
1 Followers 91 Following
小明 @xiomng637457
3 Followers 65 Following
JoycePullman @iFr6IgT3xK1rcm
89 Followers 2K Following
Ryan Zhang @RZ5I2
8 Followers 4 Following
Paper Copilot @ ICLR ... @papercopilot
1K Followers 1K Following Tracking what's happening in AI / ML via open statistics (13.39M+ global impressions) Author: @jingyangcarl
@iainsightss @iainsigths
12 Followers 285 Following Sua fonte diária de inteligência artificial! Ferramentas secretas, hacks poderosos e tudo que você precisa para dominar a IA antes de todo mundo.
Zhaocong Yang @allen_yang8
525 Followers 4K Following CS PhD student @ UNCC Data Storytelling; Viz Recommendation; RAG; HCI
Pietro Sittoni @CasselHank
42 Followers 1K Following
Dr. Kevin Gazzara @doctorkevin
49K Followers 23K Following 🅲🅴🅾 Magna Leadership & https://t.co/DumyOxk4uJ #Author Professional #Keynote #Speaker #ExecutiveCoach #Management 🅴🆇🅿🅴🆁🆃 #Professor #Leadership
raja akbar yusuf @raja_akbar60883
69 Followers 1K Following
Mert @merterayy
25 Followers 80 Following
Nazila Damghani @DamghaniNazila
9 Followers 97 Following
Samarth Sinha @_sam_sinha_
6K Followers 318 Following Scaling World Models @1x_tech Prev: Founding Team @LumaLabsAI, PhD (Dropout 😶🌫️) @UofT, @AIatMeta, @Mila_Quebec 🇨🇦
Clive Chan @itsclivetime
28K Followers 3K Following perplexity per picojoule @anthropicai // prev @openai @tesla
Vignesh Kothapalli @kvignesh1420
179 Followers 146 Following PhD Student @Stanford CS. Prev @NYU_Courant, @IITGuwahati
Stathis V. @techabilly
555 Followers 2K Following ml @stripe - but actually a 🤖 in disguise - ( )( )( )
Inflection AI @inflectionAI
49K Followers 3 Following We are an AI studio creating a personal AI for everyone. Our first is @pi, a supportive and empathetic conversational AI.
Luca Soldaini 🎀 @soldni
13K Followers 1K Following data mines are my passion ⛏️ mts @MicrosoftAI / ex co-lead Olmo @allen_ai / pfp @YanhongLi2062 / thoughts are mine, leave my employer alone / 🌈
slime @slime_framework
1K Followers 11 Following The LLM post-training framework for RL Scaling. https://t.co/4ILpx8hfKN
vLLM @vllm_project
41K Followers 36 Following A high-throughput and memory-efficient inference and serving engine for LLMs. Join https://t.co/lxJ0SfX5pJ to discuss together with the community!
Red Hat AI @RedHat_AI
11K Followers 2K Following Accelerating AI innovation with open platforms and community. The future of AI is open.
🎭 @deepfates
62K Followers 6K Following deepfates is an open-source AI project, developer, and publication focused on AI agent frameworks, large language models, and autonomous multi-agent systems.
David @DavidSHolz
102K Followers 10K Following founder @midjourney, previously founded leap motion, before that was at nasa and max planck - vibeposting @davidvibesonly
The OpenAI Foundation @FoundationOAI
7K Followers 0 Following OpenAI was founded in 2015 as a nonprofit; its mission is to ensure artificial general intelligence benefits all of humanity.
Fern @hi_tysam
3K Followers 218 Following Neural network speedrunner and community-funded open source researcher. Set the CIFAR-10 record several times. Say hi!
Tim Dettmers @Tim_Dettmers
45K Followers 903 Following Creator of bitsandbytes. Professor @CarnegieMellon and Research Scientist @allen_ai . I blog about deep learning and PhD life at https://t.co/Y78KDJJFE7.
Discrete Diffusion Re... @diffusion_llms
2K Followers 0 Following 📚 Journal club on discrete diffusion models 🎥 Replays available on YouTube! Contact: [email protected] Hosted by @ssahoo_, @jdeschena, @zhihanyang_
Peter Schmidt-Nielsen @ptrschmdtnlsn
11K Followers 894 Following Making an FPGA accelerated server at https://t.co/QVWSIUFPg1. Feel free to talk to me: https://t.co/wdWQN6fp87
Paul Graham @paulg
3.2M Followers 791 Following
Jackson Kernion @JacksonKernion
7K Followers 2K Following Now: finetuning at @AnthropicAI. Before: MIT postdoc, UC Berkeley philosophy PhD. Built https://t.co/3PWzczTzu4. Views my own.
Shiwei Liu @Shiwei_Liu66
2K Followers 582 Following Hi, I am a PI at ELLIS Institute Tübingen and MPI-IS. Was RS NIF @UniofOxford, JRF @SomervilleOx, postdoc @UTAustin, and PhD @Data_AI_TUe.
Quentin Gallouédec @QGallouedec
4K Followers 799 Following PhD - Post-training @huggingface 🤗 TRL lead maintainer 🇫🇷 in 🇨🇦
adaption @adaption_ai
9K Followers 5 Following Building extremely efficient intelligence that evolves with our world.
Epoch AI @EpochAIResearch
46K Followers 0 Following Investigating the trajectory of AI for the benefit of society.
Mehrdad Farajtabar @MFarajtabar
10K Followers 240 Following Research Scientist at @Apple, prev @DeepMind, prev @GeorgiaTech
Stefan @StefanGliga
149 Followers 903 Following AI compiler engineer and an aspiring AI researcher. I live and breathe AI papers and exotic hardware.
Emad @EMostaque
325K Followers 113 Following Building first principles, sovereign AI @ii_posts. Founder @StabilityAI. Consistent inference is possible.
Every 📧 @every
51K Followers 82 Following The only subscription you need to stay at the edge of AI. Ideas and apps: @TrySpiral @CoraComputer @SparkleApp @usemonologue
Dan Shipper 📧 @danshipper
111K Followers 2K Following ceo @every | the only subscription you need to stay at the edge of AI
Goodfire @GoodfireAI
24K Followers 29 Following Using interpretability to understand, learn from, and design AI.
Llion Jones @YesThisIsLion
8K Followers 771 Following https://t.co/lqleZpqX5J 🐠🐟 Welsh Artificial Intelligence Researcher living in Tokyo. #AIAYN
Leo Gao @nabla_theta
13K Followers 580 Following working on AGI alignment. prev: GPT-Neo, the Pile, LM evals, RL overoptimization, scaling SAEs to GPT-4, interp via circuit sparsity. EleutherAI cofounder.
will depue @willdepue
66K Followers 2K Following dei ex machina ex-@openai (sora 1 & 2, posttraining o3/4o, pretraining moonshots)
Jackmin @jackminong
2K Followers 923 Following making sand smarter @PrimeIntellect 🇺🇸 Previously @JinaAI_ 🇩🇪 @MoneyLion 🇲🇾
Toby Pohlen @TobyPhln
142K Followers 610 Following Sleeping. Previously founding team @xAI, engineer @GoogleDeepMind. @RWTH alumnus.
kache @yacineMTB
300K Followers 6K Following reinforcement learning, robots. prev eng @ x, stripe. 6'3 (height) first person to solve 6 pendulums (in the future) subscribe to read my blog!























