Seroze @stringray3
- Obsession is far greater than Discipline Joined May 2018-
Tweets3K
-
Followers116
-
Following3K
-
Likes30K
Sharing some learning from attending MLSys '26. There were a lot of interesting papers presented in distributed training and inference. Overall, I could capture the following themes: 1. Distributed training has a lot of knobs, which are really tough to manage and tune. Ton of work is being done to make it easy to manage this. 2. As training gets larger, reliability matters more. It was not surprising to see many industry talks focus on training reliability. 3. Ultra-long context lengths are getting a huge mindshare for both training and inference. 4. Heterogeneous compute (multi-region, multi-accelerator) is on the rise and is probably the next frontier of inference optimization. 5. Distributed inference still needs better auto-tuning for finding the best configs at large scale. 6. KV cache optimization, attention optimization, and quantization were already on the radar, so the number of papers on these topics was not a surprise. 7. IMO, from a skills perspective, the best thing to learn is GPU communication and networking. Learn everything around inter and intra-rack communication, NCCL, and UCCL. Lots of improvements in the coming years will come from optimizing communication between GPUs via better kernels and frameworks. For a list of interesting papers and their summaries: r0m1t.com/learnings-from…
看了一半,我靠怎么从来没人告诉我rl这么好玩
花了段时间写了 RL 教程 Hands-On Modern RL,路线是从 CartPole + PPO 入门,然后到 LLM 后训练(RLHF、DPO、GRPO)、Agentic RL。代码先行,公式用来解释现象。英文版很快更新。 目前是草稿版本,RLHF、Agentic RL 部分本地审校中。 欢迎提 PR 或 Issue & 显卡支持:github.com/walkinglabs/ha…
Had such a blast working with @erictang000 , @charlie_ruan, @sumanthhegde, and @pcmoritz on enabling multi-LoRA RL training in SkyRL! We observed ~3x higher experiments throughput in comparison to running experiments in the traditional single-tenant fashion. One of my favorite parts of this collaboration is that all this code is open source so you can play with it yourself :) Here's the technical deep dive 🧵
🏹5 Days of Trajectory. Day 3 - An Open Source Training Stack for Continual Learning Building the platform for continual learning requires both partnering with pioneering AI companies, as we showed on Day 2 with Harvey, and working toward frontier research, which we are
I broke my record on the LLM I'm training by switching from constant learning rate to warm-up + decay (scheduled) learning rate. Learning Rate Scheduling - Beginner Tutorial + Record LLM Speedrun Full video - youtu.be/PvLWxUobSoo At the start of training, the weights are far from optimal, so we use a high learning rate to make large updates and learn quickly. Later, as the model approaches a good solution, we reduce the learning rate to make smaller, more precise adjustments and avoid overshooting the optimum. Speedrun our LLM - github.com/vukrosic/unive… Become AI researcher: skool.com/become-ai-rese… (funds our compute)
One more checkmark and I feel more excited about the upcoming work in building my LLM inference server. So far, it feels great implementing core techniques like separate prefill and decode, KV cache, prefill caching, etc. The upcoming things are more interesting: Torch Compile, CUDA Graphs, SD, Quantization, and Distributed inference. Since I know these theoretically, implementing them one by one will be fun. I recently completed my study on prefix caching, which involves block hash-based and radix tree-based approaches. I have also run some benchmarks with vLLM and SGLang and will make them public soon.
We just launched a new project that teaches you how to build Flash Attention with CUDA, step by step. By the end, you’ll have a working Flash Attention kernel built from the ground up. The project covers: -CUDA primitives warm-up -Matrix operations -Naive attention baseline -Online softmax math -Tiled attention building blocks -Fused Flash Attention kernel -Causal Flash Attention It will be open to everyone for the first 2 weeks, then it will become part of our premium projects.
github.com/seroze/leetgpu… working on learning cuda
I released MazuNIX on Mazu’s birthday. Unlike many educational operating systems that avoid SMP and real-time (RT) topics, Mazu delivers SMP, multicore RT scheduling, and practical POSIX Threads support. Full source code is available: github.com/MazuNIX/mazu
github.com/databases-sero… Btree sql engine implementation in python
I have joined @GoogleDeepMind! I'll be training VLMs And I'll still keep posting about latest developments on AI, Computer Vision and LLMs So no more posts on PyTorch tricks. I might post about JAX. Stay tuned...
In 72 hours I got over 100k of value 1. Lambda gave me 5000$ credits in compute 2. Nvidia offered me 8x H100s on the cloud (20$/h) idk for how long but assuming 2 weeks that'd be 5000$~ 3. TNG technology offered me 2 weeks of B200s which is something like 12000$ in compute 4. A kind person offered me 100k in GCP credits (enough to train a 27B if you do it right) 5. Framework offered to mail me a desktop computer 6. We got 14,000$ in donations which will go to buying 2x RTX Pro 6000s (bringing me up to 384GB VRAM) 7. I got over 6M impressions which based on my RPM would be 1500$ over my 500$~ usual per pay period 8. I have gained 17,000~ followers, over doubling my follower count 9. 17 subscribers on X + 700 on youtube. The total value of all this approaches at minimum 50,000$~ and closer to 150,000$ if I leverage it all. --------------------- What I'll be doing with all this: Eric is an incredibly driven researcher I have been bouncing ideas off of over the last month. Him and I have been tackling the idea of getting massive models to fit on relatively cheap memory. The idea is taking advantage of different forms of memory, in combination with expert saliency scoring, to offload specific expert groupings to different memory tiers. For the MoEs I've tested over my entire AI session history about 37.5% of the model is responsible for 95% of token routing. So we can offload 62.5% of an LLM onto SSD/NVMe/CPU/Cheap VRAM this should theoretically result in minimal latency added if we can select the right experts. We can combine this with paged swapping to further accelerate the prompt processing, if done right we are looking at very very decent performance for massive unquantisation & unpruned LLMs. You can get DeepSeek-v3.2-speciale at full intelligence with decent tokens/s as long as you have enough vram to host the core 20-40% of the model and enough ram or SSD to host the rest. Add quantisation to the mix and you can basically have decent speeds and intelligence with just 5-10% of the model's size in vram (+ you need some for context) The funds will be used to push this to it's limits. ----------------- There's also tons of research that you can quantise a model drastically, then distill from the original BF16 or make a LoRA to align it back to the original mostly. This will be added to the pipeline too. ------------------ All this will be built out here: github.com/0xSero/moe-com… you will be able to take any MoE and shove it in here, and with only 24GB and enough RAM/NVMe to compress it down. it'll be slow as hell but it will work with little tinkering. ------------------ Lastly I will be looking into either a full training run from scratch -> or just post-training on an open AMERICAN base model - a research model - an openclaw/nanoclaw/hermes model - a browser-use model To prove that this can be done. -------------------- I will be bad at all of it, and doubt I will get beyond the best small models from 6 months ago, but I want to prove it's no boogeyman impossible task to everyone who says otherwise. -------------------- By the end of the year: 1. I will have 1 model I trained in some capacity be on the top 5 at either pinchbench, browseruse, or research. 2. My github will have a master repo which combines all my work into reusable generalised scripts to help you do that same. 3. The largest public comparative dataset for all MoE quantisations, prunes, benchmarks, costs, hardware requirements. -------------------------- A lot of this will be lead by Eric, who I will tag in the next post. I want to say thank you to everyone who has supported me, I have gotten a lot of comments stating: 1. I'm crazy, stupid, or both 2. I'm wasting my time, no one cares about this 3. This is not a real issue I believe the amount of interest and support I've received says it all. donate.sybilsolutions.ai
See this Instagram video by @ instagram.com/reel/DU_UF-Bj4…
Life has to be eventful, otherwise where's the fun.
if you don't actively shape the life you desire each day, you'll eventually wake up to find that years have slipped by unnoticed
moritake @moritake04
594 Followers 684 Following Software/ML Engineer | Computer Vision | Kaggle Master (🥇x 1, 🥈x 3, 🥉x 1) https://t.co/5rKFokZVWe
Gaspard kirira @g_kirira
175 Followers 590 Following Founder of @Softadastra | Creator of Vix.cpp | Building offline-first systems that work without reliable internet
J Rosser @jrosseruk
766 Followers 1K Following Intern @Cohere | DPhil @UniofOxford @j_foerst | MATS 10.0 @NeelNanda5
bowwowforeach @bowwowforeach
962 Followers 540 Following 競技プログラミングをします。主にヒューリスティック/マラソン/ゲームAI。(CodinGame/AtCoder/TopCoder MM)
Sankurero @S4nkurero
99 Followers 81 Following LoL ’25 Diamond | Kaggle Master| KDD Cup ’24–’25 Prize Winner | RecSys Challenge ’25 🥉
Floriana @Aja1755853
30 Followers 950 Following I’m learning to love the sound of my feet walking away from things not meant for me.
EthelDulles @2TLkV0po5W6uT4
90 Followers 2K Following
尺八 @Syaku83
649 Followers 617 Following 明大M2 AtCoder水色 初心者競プロer https://t.co/r6DHjTMwza 精進記録用 @108memomemo
Mary @marybennett79
2K Followers 4K Following Everyone Wants Love But Nobody Wants To Be Vulnerable.
Sagir Ahmed สกิ... @SagirAhmed98
2K Followers 686 Following Insta: @sagir_langs Eastern Indo-Aryan, Assamese, Sylheti, Bodo-Garo, Tai & other languages, History, Linguistics, Scripts (esp Brahmic) 🇮🇳 he/সি/ꠢꠦ/बि/উৱাঁই
-SIX-Reina @six_reina6
875 Followers 2K Following @SIX_tsukuba スタッフのれいなです🐱🎶 DM返信できません😣🙏💦筑波大生です!’03/ITF22.芸専
Dan Saunders @djsaunde
874 Followers 3K Following research eng / perf at @LumaLabsAI prev @UnslothAI @axolotl_ai @awscloud small homestead owner 🐑🐓
Proofig AI @Proofig
2K Followers 2K Following Proofig AI: Image Proofing & Integrity software for scientific publications. Automatic detection of manipulation, duplication, plagiarism & AI-generated images.
Stéphane Deny @StphTphsn1
4K Followers 7K Following Neuroscience & ML Researcher. Posting about various topics on here. I retweet papers to increase their visibility (I do not read all), tag me for a retweet.
Aakanksha Chowdhery @achowdhery
13K Followers 6K Following @Stanford @reflection_ai // Previously @GoogleDeepMind :: PaLM, Gemini // @MSFTResearch, @Princeton // views my own and subject to change
JoanMoses @pZOLE6L60XRT8U4
118 Followers 3K Following
inady / イナディ ... @_inady_
125 Followers 110 Following プログリット SRE/AI推進担当 | ← atama plus ← アスクル | 趣味: 英語学習, カメラ | ポッドキャスト https://t.co/PiJGZT9fqI
Vishal S Pandey @its_vayishu
3K Followers 697 Following Computational Neuroscience. Independent Researcher. ex-@AIatMeta (optimizers)
Jianyang Gao @gaoj0017
3K Followers 884 Following Author of the RaBitQ quantization algorithm; Postdoc at @ETH on AI, ML System, Vector Database; prev. PhD @NTUsg; ICPC World Final;
senthil kumar @senthilsam28
22 Followers 5K Following
Lazy Hippo @AArjunAArmy
10 Followers 233 Following
Nicole odell @Micoleodell
77 Followers 600 Following Transform your Trading Experience- Compete for $10k_$100k - Expert Guidance and Support- Blockchain Solutions for Success- Insights for Profitable Trading.
kmt @kmtverse
377 Followers 3K Following kaushlendra mani tripathi | everything i touch starts from the first principles
おふとんらぼ@�... @MohutonLab_comp
2K Followers 2K Following AtCoder青/Codeforces青/PAST上級/Kaggle Expert(銀1,銅1) ラブライブ/やきう/DDRなどその他はこっち(@ohuton_labo)
hotman @hotmanww
2K Followers 3K Following mis.w54/traP24M 浅野-早大機航-科学(東工)大数理計算-社会 データ構造を志す AtCoder,codeforces : hotman78 (黄橙反復勢...) アイコン:@yuchemisw
אגי-e/acc @murage_kibicho
3K Followers 5K Following Statistics @Yale | @LeetArxiv - Leetcode for implementing Arxiv papers
ろん |「マテリ�... @mipypf
1K Followers 1K Following AI for Science /「マテリアルズ・インフォマティクス 実践ハンドブック」発売中! https://t.co/DrbV0NUSEE Kaggle Master/博士(工学)/講演や執筆依頼はDMにお願いします
Haru @52ng8PZ
292 Followers 459 Following 機械(修士), kaggle expert,🥇×0🥈×1🥉×2 shake down が得意, Chitose Hokkaido
Shriram Balaji @shrirambalaji
2K Followers 6K Following building distributed systems @microsoft • ⚭ @swetha__raman • tinkering with systems, databases and web • blogs at https://t.co/acjkgbBqNO
rihanneko @rihanpiggy
265 Followers 129 Following ML Engineer at PFN | ex-Accenture | ex-NRI | Kaggle Grandmaster 🥇8🥈4🥉2 | 世界ランク(最高) 16 | https://t.co/ZY82oakpSf | 画像認識 | LLM | 統計学 | 発言は全て個人的意見です
Evance Soumaoro @evanxg852000
639 Followers 708 Following SWE & ex @Quickwit_Inc @eHealth_africa | @recursecenter W2‘23 | Consulting @WorldBank | System programming enthusiast & speaks: 🦀, Go, ⚡, 🐍, JS & Java
RJ Barman @rjthepigeon
664 Followers 1K Following Taking some time off to study databases and systems programming
Adaptatron @adaptatron
156 Followers 15 Following
Lodnoez @Lodnoez4Oj
47 Followers 1K Following
levi @levidiamode
5K Followers 657 Following 365 days of GPU programming ▓▓▓▓▓▓▓▓▓▓▓▓▓░░░░░░░░░░░░░░░░░░░░░░░░ 146/365
Prateek Dhawan @PattyInnovate
13K Followers 104 Following That Indian jet engine guy. Broke founder @ DGPropulsion. Jet engines in India. Propelling India forward.
Yuval Tassa @yuvaltassa
2K Followers 74 Following Robot Simulation team, DeepMind. MuJoCo developer. https://t.co/KSUnjj2ELr
Vikash Kumar @Vikashplus
7K Followers 873 Following Building Human-embodied Intelligence. CEO @MyoLabAI | Sr. research scientist @OpenAI @GoogleAI @AIatMeta | @berkeley_ai @UWcse #MuJoCo | Ad. Prof. @CMU_Robotic
Tony Zhao @tonyzzhao
67K Followers 956 Following Co-founder and CEO @sundayrobotics. Stanford PhD dropout, ex Deepmind, Tesla, GoogleX
Sai Surya Duvvuri @dvsaisurya
1K Followers 363 Following Automating AI research @CoreAutoAI | CS PhD student at UT Austin. Prev SR at Google, Meta and RF at MSR | CS IIT KGP https://t.co/0slDWFFxAE
Tilde @tilderesearch
5K Followers 10 Following We build foundational understanding of models to advance the frontier of intelligence.
Object Zero @Object_Zero_
37K Followers 981 Following Doer of the difficult. Champion for talent. Inventor of things. Builder of Machines. North Sea O&G, Nuclear Power, Subsea, Heavy Manufacturing.
Joe Sluis @jyobo10
4K Followers 702 Following software engineer @microsoft | databasemaxxing | ssh https://t.co/stYIgDkOem
Jaydev @JaydevTonde
328 Followers 897 Following Senior Data Scientist at Wolters Kluwer, LLM Inference, Kaggle Competitions Expert
DeepLearning.AI @DeepLearningAI
336K Followers 114 Following We are an education technology company with the mission to grow and connect the global AI community.
Thomas Simonini @ThomasSimonini
10K Followers 859 Following Solo Indie Game Dev Making The Lighthouse Investigation: a detective game where you solve a 1907 cold case. Ex-@huggingface
AssemblyAI @AssemblyAI
46K Followers 411 Following Access powerful AI models to transcribe and understand speech via a simple API. Try our no-code playground for free 👉 https://t.co/YPCK9mq5Qy
Darek Kłeczek @dk21
4K Followers 2K Following Machine Learning, Kaggle and occasional pictures from Poland. LLM/AI Research at Snowflake. 4x Kaggle Grandmaster. Personal stuff only.
Turing Post @TheTuringPost
86K Followers 9K Following On X we surface the AI research that matters and explain the ideas behind it. In the newsletter, we connect the dots between AI’s past, present, and future ⬇️
vicki @vboykis
58K Followers 1K Following I move vectors to different machines sometimes. Founding ml engineer in recsys/search. building ✨I like Nutella.
kirsten lum @kirsten_lum_
11K Followers 918 Following 🌲🌲 follower the way | applied AI, data science | wife of @wingyewlum, mom of 2 | cofounder and cto building https://t.co/TBEWa1IYzH
Rabotni(kuma|熊) @analokmaus
4K Followers 1K Following 👨💻 | (登山|ラン)系(AIエンジニア|公衆衛生学者) | @SakanaAILabs | Stealth SU | UTokyo | @Kaggle Grandmaster | @GoogleDevExpert (AI) | 🗣 | 🇯🇵🇬🇧🇨🇳🇷🇺
JFPuget 🇫🇷🇺�... @JFPuget
20K Followers 2K Following Machine Learning at @Nvidia, 6x Kaggle Grandmaster CPMP. Arc Prize winner. ML PhD. Ex ENS Ulm, ILOG CPLEX, IBM. Views are my own.
Steven Enamakel @senamakel
11K Followers 1K Following Chef Buildoooor at @tinyhumansai - Focused on building open-source & taking over the world with AI - Ex-DeFi Managed over 300mn$+
Viv @Vtrivedy10
13K Followers 2K Following applied research @LangChain, prev @awscloud, phd cs @templeuniv
Emir Atli @emiratli_
6K Followers 694 Following Co-founder at HockeyStack. Revenue Agents for the Enterprise. Raised $50M+ from YC, Bessemer, GC, Soma, and Uncorrelated.
Jess Li @jessicafeiyali
2K Followers 288 Following applied research @PrimeIntellect / prev @cartesia, @beaconsai, @Soma_Capital, @Harvard / dog walker @sfspca and distance runner (8x half and full marathons)
AskLivermore @asklivermore
130K Followers 119 Following Singapore's #1 trader now on X. My posts and replies are NOT financial advice - I’m a trader, not a licensed advisor.
moritake @moritake04
594 Followers 684 Following Software/ML Engineer | Computer Vision | Kaggle Master (🥇x 1, 🥈x 3, 🥉x 1) https://t.co/5rKFokZVWe
Wulfie Bain @wulfie_bain_
4K Followers 141 Following @OpenAI Applied AI International Lead, Startups. Prev CTO/founder, @BCG, @UniofOxford. Small sparks ✨ & just working things out
witcheer @witcheer
10K Followers 1K Following Growth @YariFinance · sovereign compute advocate · ex @KPMG
Jim Huang @jserv
12K Followers 8K Following "A hacker, a lecturer, a father" // Adjunct faculty at @NCKU_official
Gaspard kirira @g_kirira
175 Followers 590 Following Founder of @Softadastra | Creator of Vix.cpp | Building offline-first systems that work without reliable internet
wang @wangzhr4
3K Followers 939 Following Creating the computing engine for next era. Cloud | Backend | Frontend | AI Infra
Yash Patil @ypatil125
9K Followers 599 Following Co-Founder, CEO @appliedcompute prev: @OpenAI, @Stanford
Venkat — inference ... @venkat_systems
729 Followers 2K Following distributed systems, low latency, inference | 🦀 | hobbies: ⛷️ 🏊🏽♂️ 📷
Pingbang Hu 🇹🇼 @PingbangHu
3K Followers 369 Following I work on, with, and for data. Ph.D. candidate @UofIllinois. Fellows @AnthropicAI. Interns @ SIG @amazon @jouhouken. Alumni @Umich @SJTU1896.
Nihal Pasham @npashi
1K Followers 119 Following 🦀 Rust Tech | @Nvidia | Make general purpose GPU programming accessible 🖖 Disclaimer: The views, opinions expressed are my own (not my employer's)
Namjae Jeon @JeonNamjae
54 Followers 27 Following Linux ksmbd kernel server maintainer. Linux exfat filesystem(linux kernel)/exfatprogs(userspace utilities) maintainer.
Xiuyu Li @sheriyuo
9K Followers 1K Following Undergrad @RUC1937 | RL, Optimization, dLLM | Intern @JD_Corporate | Applying for Fall 27 PhD | Friend @dviolettchan
Reiner Pope @reinerpope
19K Followers 458 Following CEO and founder, @MatXComputing, developing high throughput chips tailored for LLMs





































