Kwangjun Ahn @KwangjunA
Researcher at NVIDIA // ex-Researcher at Microsoft, PhD from MIT EECS kjahn.mit.edu Cambridge, MA Joined February 2020-
Tweets73
-
Followers729
-
Following316
-
Likes91
@JohnCLangford Updates on the Dion codebase (github.com/microsoft/dion), please check them out! - Dion2 (arxiv.org/abs/2512.16928), which has much simpler math than Dion. - NorMuon (arxiv.org/abs/2510.05491) thanks to @li_zichong.
sacled up to 12.7B dense, 5.5T tokens. - polynorm (optimized kernel) - grouped diff attn (their work) - parallel muonclip (adopt alltoall like mainhorse, essential, dion) - 80M batch it's still non-reasoning, also not moe though... keep pushing guys! arxiv.org/abs/2511.07464
they also dropped fsdp2 optimized muon. though they don't use muon for 2.6b dense model, i think it's just beginning and they are preparing larger one. they pipeline muon's comm-comp with calc flops and the code is neat. not sure if it's existing method. huggingface.co/Motif-Technolo…
@ionutmodo Thanks! Looks great, let me read through it
New improvement in Dion leads to a speedup that makes orthonormal updates (eg. Muon) more scalable for larger matrices. The trick: carefully using Newton-Schulz (on smaller matrices) as Dion's backend. Updates to our microsoft/dion codebase are coming soon---stay tuned!
Join us on Sept 24 at 8 AM PT for Microsoft Research Forum Season 2 – a virtual series highlighting purposeful research and its real-world impact, from fundamental exploration to advancing AI responsibly, scaling innovation through products and open source, and driving positive change for society. Register now: msft.it/6011scy27
@jxbz love the repo! clean code, good practices but still not overly over-engineered, triton kernels, well documented, simple reference implementations alongside optimized code. nice
Lot, lot of alpha here
I had wondered why there was no official Dion implementation by the authors... I guess now we know. This repository looks dope: FSDP Muon and Dion implementations, triton kernels for Newton-Schulz, and lots of practical advice (1/2)
@eliebakouch @jxbz @JohnCLangford @GagMagakyan Thank you!! Glad to hear it
@jxbz @JohnCLangford @GagMagakyan Thank you for the kind words! Hope this proves useful to the community!
Looks like extremely exciting and useful work by @KwangjunA, Byron Xu, Natalie Abreu, @JohnCLangford and @GagMagakyan github.com/microsoft/dion/ (2/2)
I had wondered why there was no official Dion implementation by the authors... I guess now we know. This repository looks dope: FSDP Muon and Dion implementations, triton kernels for Newton-Schulz, and lots of practical advice (1/2)
[1/6] Curious about Muon, but not sure where to start? I wrote a 3-part blog series called “Understanding Muon” designed to get you up to speed—with The Matrix references, annotated source code, and thoughts on where Muon might be going.
Apparently Dion is now being worked on for Torch Titan: github.com/pytorch/torcht… :-)
Since nobody asked :-), here is my list of papers not to be missed from ICML: 1) Dion: distributed orthonormalized updates (well, technically not at ICML, but everyone's talking about it). 2) MARS: Unleashing the Power of Variance Reduction for Training Large Models 3) ...
@MParakhin Thanks for advertising Dion! :)
Since nobody asked :-), here is my list of papers not to be missed from ICML: 1) Dion: distributed orthonormalized updates (well, technically not at ICML, but everyone's talking about it). 2) MARS: Unleashing the Power of Variance Reduction for Training Large Models 3) ...
@orvieto_antonio @micahgoldblum @teodorasrec @jonasgeiping Nice results! One question: wouldn’t large (global-)batch size be more practical for distributed training? Does that mean still SGD is not effective for large scale?
@seungwookh @jxbz Go Jeremy and Laker!!
But actually this is the og way of doing it and should stop by E-2103 to see @jxbz and Laker Newhouse whiteboard the whole paper.
Laker and I are presenting this work in an hour at ICML poster E-2103. It’s on a theoretical framework and language (modula) for optimizers that are fast (like Shampoo) and scalable (like muP). You can think of modula as Muon extended to general layer types and network topologies
@konstmish @aaron_defazio Thanks Konstantin!
Schedule-Free methods, which forgo cosine/linear schedulers by averaging iterates and computing gradients at interpolated points, yield smoother training curves. It's still unclear why they work well, and this paper explains the phenomenon through the river-valley loss landscape.
Pocket @PocketPriors
1K Followers 3K Following
📗 @__the__human__
26 Followers 4K Following
Limes Inferior @Waldi_Ben
186 Followers 3K Following Wolność słowa najważniejszym prawem człowieka. Freedom of speech is the most important human right.
RJ Skerry-Ryan @rustyryan
1K Followers 1K Following 🌮🤖 Speech and language modeling researcher. Principal SWE @ Google Deepmind. ♊🌊 Gemini Audio and Astra core team.
Kelvin 🦖🤓 @kelvinhan
90 Followers 2K Following #NLProc PhD-ed at @labo_Loria. Currently Research Fellow at @singaporetech. Questions generator. https://t.co/3mUSCnSHTf
Pratyusha Sharma @pratyusha_PS
5K Followers 471 Following Science ⇌ Deep Learning. Incoming Asst. Professor at NYU (@NYU_Courant & @NYUDataScience). Sr Researcher at @Microsoft. PhD @MIT_CSAIL.
Burny - Effective Cur... @burny_tech
19K Followers 10K Following On the quest to understand the fundamental mathematics of intelligence and of the universe with curiosity. https://t.co/mMchI2d4pg Upskilling @StanfordOnline
sean lee @infinitefun_
1K Followers 5K Following synthetic libidinology | prev cofounded @websim_ai @southpkcommons @google
Uyên Lê Phương @uyenlp02
0 Followers 7 Following
Honglin Chen @HonglinChen_
685 Followers 1K Following PhD student in computer graphics @Columbia; previously MSc student @UofT and undergrad @ZJU_China. She/her
Louis @Louis9687221579
95 Followers 4K Following Mainline Economics | Idea page | ramblings of a schizo
Ali Naeimi @Ali_NT99
12 Followers 67 Following AI Research Engineer | Protein and Genome LM | Perfetto enjoyer | Distributed Pretraining optimization
Ivan M @med_1v
1 Followers 4K Following
Chang Shi @sshchang
187 Followers 947 Following PhD student @UTAustin. Interning @MSFTResearch NYC. Towards general-purpose robots 🤖. Previously @CMU_Robotics @Amazon Robotics @NECLabsAmerica
Mike pompeo @mikepompeo345
7 Followers 729 Following Fmr U.S. SecState & CIA Director | Christian, husband, father, Army vet, Kansan | @CAV_PAC Chair | Never Give An Inch
Sameera Lanka @samspacelanka
105 Followers 723 Following Machine Learning Scientist @Microsoft Previously @MSFTResearch, @NCState
Daniel @rn_xiv01
88 Followers 4K Following
Chirag Lakhani @cmlakhan
517 Followers 4K Following Currently: Staff Scientist @nygenome in the @david_a_knowles lab. Previously: @HarvardDBMI Postdoctoral Research Fellow in @chiragjp lab https://t.co/APTLHsivM2
Arman Adibi @arman_adibi23
703 Followers 3K Following Assistant Professor, @AUG_Cyber |Postdoc @Princeton | Ph.D. from @Penn, @WarrenCntrPenn | Studying machine learning and optimization.
Yorgos Pantis @yorgos_pantis
112 Followers 1K Following Ph.D. studies @uoaofficial and @ArchimedesUnit. Ex-research @MIT, @dtutweet, @athenaRICinfo, @studyatctu, @mpiMathSci. Ex-studies @UniofOxford, @upatras.
Zheyang Xiong @zheyangxiong
50 Followers 280 Following
Michal Wolski @michalwols
1K Followers 2K Following CV / multimodal eng prev: video gen @character_ai, founder at biteai (acq by MFP), 1st employee @clairifai, CS @columbia
Neil Tenenholtz @ntenenz
964 Followers 1K Following Multimodal model training for biology / healthcare at MSR
Gautam Goel @gautamcgoel
4K Followers 710 Following Machine learning postdoc at the Simons Institute.
Jyoti Aneja @JyotiAneja
595 Followers 687 Following Member of Technical Staff @MSFTResearch | Developing Phi models | PhD from the University of Illinois, Urbana-Champaign👩🏻💻
Junhoo Lee @junhoo98
0 Followers 11 Following
F_{un} @DmodBunG
248 Followers 2K Following 취미생활용 계정입니다. 전문성을 담보할 수 없습니다. 보통 수학, 물리, 애니, 언어, 철학, 가끔가다 일상생활을 다룹니다. Learning is Fun!!
최기원 @ckw1140
3 Followers 694 Following
jshan @jisu_han__
36 Followers 454 Following phD at @SeoulNatlUni interested in robotics, embodied ai
Govind K @t2govind
2K Followers 7K Following Research Engineer @Microsoft. Developing experiences that people want
Kaustubh Ponkshe @KausP11
42 Followers 236 Following AI PhD at MLO Lab, EPFL | IIT Bombay, MBZUAI Currently interested in everything related to pretraining and data
dew @GenericHoneydew
16 Followers 2K Following
Alex Hägele @haeggee
1K Followers 702 Following PhD Student in ML @ICepfl MLO. MSc/BSc from @ETH_en. Previously: Fellow @AnthropicAI, Student Researcher @Apple MLR.
Minxin Zhang @zhang_minxin
2 Followers 43 Following
Aditya Sinha @adityaasinha
1K Followers 4K Following Research @Netflix, MS CS at UIUC | Previously @GoogleAI, @MSFTResearch | BITS Pilani, Goa.
Liliang Ren @liliang_ren
4K Followers 721 Following Pretraining @thinkymachines | Prev. Microsoft Superintelligence | UIUC CS PhD | Scaling Efficient LLM | NLP
Gautam Goel @gautamcgoel
4K Followers 710 Following Machine learning postdoc at the Simons Institute.
Ji-Ha @Ji_Ha_Kim
5K Followers 122 Following
Atli Kosson @AtliKosson
471 Followers 535 Following ML Researcher (PhD, EPFL) — optimization & training dynamics of large neural networks | previously at Tesla, Cerebras, Amazon
Tony S.F. @tonysilveti
1K Followers 388 Following Ass. Prof. (maître de conférences) at @CentraleSupelec
Seunghyun Seo @SeunghyunSEO7
3K Followers 946 Following deep learning enjoyer. from speech to llm @ naver, now exploring image space @midjourney
Ionut-Vlad Modoranu @ionutmodo
48 Followers 261 Following PhD student @ ISTA 🇦🇹 Former Research Intern @ Together AI 🇳🇱 Studied Computer Engineering & Machine Learning 🇷🇴
Thibaut Boissin @ThibautBoissin
296 Followers 257 Following
Jessica Mastronardi @JessMastronardi
725 Followers 777 Following Microsoft Research Global Programs. She/her. Technology optimist. Made in Canada. Follow @MSFTResearch for the goods.
Jacob Austin @jacobaustin132
8K Followers 927 Following I sometimes do AI research. I also play piano and climb. NYC. Previously @GoogleDeepMind, @Google Brain. Opinions my own
Alex Hägele @haeggee
1K Followers 702 Following PhD Student in ML @ICepfl MLO. MSc/BSc from @ETH_en. Previously: Fellow @AnthropicAI, Student Researcher @Apple MLR.
Manan Tomar @manan_tomar
575 Followers 586 Following Postdoc @MSFTResearch NYC. Previously @rlai_lab, @berkeley_ai, @AIatMeta, @iitmadras. Opinions, if you find any, are my dog’s.
leloy! @leloykun
8K Followers 5K Following Math @ AdMU • NanoGPT speedrunner • Muon fan 🤍 • prev ML @ XPD • 2x IOI & 2x ICPC WF • https://t.co/nfO038itfn
Dinghuai Zhang 张鼎... @zdhnarsil
5K Followers 2K Following coding RL bigrun @xAI. Prev: @MSFTResearch / @Apple MLR / FAIR Labs @MetaAI, PhD at @Mila_Quebec, math undergraduate at @PKU1898.
Riashat Islam @riashatislam
2K Followers 1K Following Research Scientist @ms_aifrontiers @MSFTResearch NYC; Ex @HUMAIN @DreamFoldAI PhD @Mila_Quebec, intern @MSRNYC @AppleMLR; RL, Reasoning and LLMs; WorldModels
JingyuanLiu @JingyuanLiu123
4K Followers 571 Following https://t.co/D7zLeTZRMh is all you need | I am not Jianlin, just love his work... | Opinions are my own
elie @eliebakouch
19K Followers 4K Following training llm @PrimeIntellect (prev: @huggingface) anon feedback: https://t.co/JmMh7Sg3mL
Samet Oymak @SametOymac
1K Followers 318 Following Professor @UMich EECS | Visiting Faculty @Google. Research on the Foundations of ML+RL+LLM
Less Wright @lessw2020
201 Followers 19 Following @PyTorch, Large Scale Distributed AI Training, Object Detection, Optimizers, Stock Indexes
Xidulu @xidulu
638 Followers 709 Following Xi Wang, Full-stack Bayesian, ECNU, UMass CICS, JHU CS, Fan of U-Shape. Previously MSR Cambridge, Netflix Research
Micah Goldblum @micahgoldblum
9K Followers 759 Following 🤖Prof at Columbia University 🏙️. All things machine learning.🤖
EleutherAI @AiEleuther
28K Followers 103 Following A non-profit research lab focused on interpretability, alignment, and ethics of AI. Creators of Pythia, VQGAN-CLIP, and using SAEs for interp
Konstantin Mishchenko @konstmish
8K Followers 730 Following Research Scientist @AIatMeta Previously Researcher @ Samsung AI Outstanding Paper Award @icmlconf 2023 Action Editor @TmlrOrg I tweet about ML papers and math
Songlin Yang @SonglinYang4
18K Followers 3K Following pretraining @thinkymachines. Prev. PhD @MIT_CSAIL. INTP 🐱. she/her/hers.
Benjamin Thérien @ M... @benjamintherien
530 Followers 609 Following Ph.D. student at UdeM & Mila | RS Intern @ Meta | Distributed training & creating learned optimizers that generalize
Percy Liang @percyliang
108K Followers 425 Following professor of computer science @Stanford @stanfordnlp, co-founder of @togethercompute, creator of https://t.co/7R5THVogW2, co-founder of @simile_ai, pianist
Tyler LaBonte @tmlabonte
892 Followers 698 Following ML PhD student @GeorgiaTech, Math BS @USC. Deep learning theory, generalization, robustness.
Ashish Vaswani @ashVaswani
32K Followers 2K Following
Allen Nie (🇺🇦�... @allenainie
3K Followers 2K Following Gemini training @GoogleDeepMind. Working on RL agents. Co-creator of Trace. Prev: RL PhD @StanfordAILab, @MSFTResearch, @DeepMind, @AWS Neuron
Jihoon Tack @jihoontack
809 Followers 664 Following Senior Researcher @MSFTResearch | Ph.D. @kaist_ai
Namhoon Lee @namhoonlee09
359 Followers 487 Following Assistant Professor of Computer Science and Engineering at POSTECH
Seonho Kim @seonhokim1005
1 Followers 20 Following
Arvid Frydenlund @ArvidFrydenlund
90 Followers 500 Following Ph.D. in Machine Learning from the University of Toronto. Now with Manulife Applied AI Research.
Wing Lian (caseus) @winglian
11K Followers 2K Following @axolotl_ai OSS maintainer. Axolotl AI founder. AI/ML tinkerer. Building tools for everyone.
Kevin K. Yang 楊凱�... @KevinKaichuang
23K Followers 6K Following Principal Researcher in BioML @MSFTResearch (@MSRNE). He/him/他. 🇹🇼
Nishanth Dikkala @NishanthDikkala
422 Followers 290 Following Research Scientist @ Google Research, Ph.D. Computer Science, MIT.
Edward Hu @edward_s_hu
1K Followers 383 Following cs phd @penn, prev @MSFTResearch. investigating ai / rl / intelligence.
Nguyen Anh Duc @Anh_Duc_Nguyen_
18 Followers 26 Following Incoming CS PhD @SCSatCMU https://t.co/GkJdPgHlgu





























