Alex Peng @alexpeng
posttraining @xai, formerly @cognition, @windsurf, @stanford alexpeng.me San Francisco, CA Joined August 2009-
Tweets92
-
Followers529
-
Following524
-
Likes414
@ShashwatGoel7 @dwarkesh_sp Have you considered using FutureSim/evals like it where you predict unseen events as a training data pipeline? ie RLing on rejection sampled traces where you filter for rollouts with the most surprise/miscalibration, then reevaluate with fresh events
@ShashwatGoel7 This is such a cool benchmark! Great work
I worked on this! It was a true end to end effort experimenting with data mixes, training at scale, and identifying and climbing the right hills to create this model
Proud to see Grok 4.3 doing well on @Designarena’s Agentic Slides Arena. Surprising to see the power of a small model! I co-lead our Office Agent effort for Grok4.3, and it’s been lots of fun building this with the team.
So grateful for the chance to contribute and learn from the team across posttraining, data, and evals for this model release. More to come!
xAI has launched Grok 4.3, achieving 53 on the Artificial Analysis Intelligence Index with improved agentic performance, ~40% lower input price, and ~60% lower output price than Grok 4.20 The release of Grok 4.3 places @xai just above Muse Spark and Claude Sonnet 4.6 on the
@si_pbc @sonyatweetybird @MikowaiA @YasminRazavi @karpathy @tszzl @_milankovac_ Congrats on the raise guys!
@UziObi Coming soon
The standard for frontier coding evals is changing with model maturity. We now recommend reporting SWE-bench Pro and are sharing more detail on why we’re no longer reporting SWE-bench Verified as we work with the industry to establish stronger coding eval standards. SWE-bench Verified was a strong benchmark, but we’ve found evidence it is now saturated due to test-design issues and contamination from public repositories. openai.com/index/why-we-n…
Seems like a lot of people are taking this as gospel—when we say the measurement is extremely noisy, we really mean it. Concretely, if the task distribution we're using here was just a tiny bit different, we could've measured a time horizon of 8 hours, or 20 hours.
We estimate that Claude Opus 4.6 has a 50%-time-horizon of around 14.5 hours (95% CI of 6 hrs to 98 hrs) on software tasks. While this is the highest point estimate we’ve reported, this measurement is extremely noisy because our current task suite is nearly saturated.
Ad Astra 🚀
SpaceX has acquired xAI, forming one of the most ambitious, vertically integrated innovation engines on (and off) Earth → spacex.com/updates#xai-jo…
cool :)
so after 24 hours we tallied early returns (from people koding on Saturdays mind you): @xai Grok is currently #3 coding model in the world by early voters (after 1 day and thousands of full agent votes). its really interesting to see the order shaken up, and there’s a reason
@TheGregYang Take care Greg - office won’t be the same without being the LED sign guy every time candidates come around ❤️
Doesn’t seem beneficial for frontier labs to facilitate their own commoditization by improving along this dimension too (see: opus 4.5 in CC vibes vs in other harnesses)
How are people thinking about harness/scaffold resilient coding models? Seems implausible to me that it will be important in the long term as labs develop their own harnesses and buyers consolidate around those products (codex, claude code etc).
Minimax casually dropping that they’re training world models that solve the halting problem to make better coding models
I led eval process + deployment for frontier coding agents at multiple F500 companies at @windsurf then @cognition The only eval enterprises care about is if the deployment saves them money (time/headcount) Upstream of that eval metrics are extremely rudimentary. mostly still just acceptance rate, PR cycle review time improvements, etc. Qualitative feedback from principal/staff engs dominates. Working on code evals @xai now and the difference between what SWE agent/coding teams in frontier labs care about vs the median enterprise buyer is huge
has anyone seen an agentic deployment in a large enterprise work without an eval?
Blake Jenkins @jblakej5
842 Followers 4K Following
Ofir Press @OfirPress
18K Followers 8K Following I push the AI frontier by building tough benchmarks with amazing people. SWE-bench, SWE-agent, SciCode, AlgoTune. Postdoc @Princeton. PhD @nlpnoah @UW.
Nelo Puchades @nelopuchades
835 Followers 376 Following burning ai tokens 24/7 • creating • learning
SANTIAGO @VolcanicKing
10 Followers 908 Following
λux @novasarc01
22K Followers 3K Following tensor shepherd in a non-euclidean pasture | grazing on cuda cores
Grace Kay @graceihle
2K Followers 2K Following Reporter @TheInformation | I write about all things Tesla and xAI | Bylines @businessinsider, business, @forbes, @michigandaily | Tips via Signal at gracekay.11
zshuanyan @SageTale1
6 Followers 3K Following
waaaw.eth🦇🔊 @waaaw_eth
909 Followers 5K Following 🧪Recruiter at Snair // Operating Partner VCrypto // Crypto Anarchist since 2019 // Community Lead. @ourbit / #EMEA prev: @ https://t.co/8RXLjqLoA4 , #btc #xmr #nock
Jakson @marcusseen
0 Followers 157 Following
Testlabor @testerlabor
14K Followers 777 Following Tech and AI Tester & researcher - Grok user since 1.5 - Grok Build since 0.2.2 - News, Updates, Results and more.
Rylan Barnes @schmylan
287 Followers 938 Following Engineer. Designer. Software bushwhacker. @web4_dev
Michael Hu @michahu8
1K Followers 669 Following NLP, language models | PhD @NYU | continual learning @NVIDIA | views my own @NSF GRFP fellow | prev @microsoft, @princeton_nlp, @cocosci_lab.
Wenhao Chai @ CVPR 20... @wenhaocha1
4K Followers 2K Following PhD student @PrincetonCS. @GoogleDeepMind. opinions are my own.
Chenyu (Monica) Wang @ChenyuW64562111
1K Followers 709 Following PhD @MIT_CSAIL | Prev @AIatMeta @genentech @Tsinghua_Uni
Shontelle Bloomer @ShontelleB41552
1 Followers 70 Following
Chatara Hoehne @ChataraH47831
1 Followers 67 Following
Prakshan Vasuthan @PVasuthan
0 Followers 709 Following
martin druart @DruartMartin
4 Followers 174 Following CTO et entrepreneur, je développe des solutions d’IA pour optimiser les soins de rééducation et faire avancer l’innovation en France.
Anh Nguyen @NguynTu24128917
1K Followers 7K Following Member of Technical Staff at Project Prometheus ex Foundation Model @Apple, Phi @MSFTResearch
Aleksandar @smiliebg
51 Followers 251 Following
vivek bhardwaj @vivekbw03
367 Followers 715 Following observing geese @uwaterloo @Laurier | prev @upfrontvc @palantirtech @shopify @ComposerTrade
Odd&Entertaining @oddreport
2K Followers 548 Following Odd Report. Commentary that tries to take things as unserious as possible. Tech & Odd news with history in perspective. AI, Grok, Hermes & space insights. Join!
ryp @ryperin
213 Followers 3K Following
David Coe @DavidCo97114831
0 Followers 877 Following
Dan @Daniel_Farinax
9K Followers 454 Following 🇺🇸 Building OS AI Harness | FullStack Problem Solver | Creator | Hacker (Follow me for life changing videos about AI) prev: @osmosis, @bitcoinprivate
Arvindh Arun @arvindh__a
742 Followers 795 Following RS intern @SakanaAILabs | @ELLISforEurope @MPI_IS PhD student
AI Deeply @AiDeeply
519 Followers 7K Following AI is reshaping the world. Visit https://t.co/aZgAXUcZbe to learn more about the people and companies driving the change.
Delta Institute @DeltaInstitutes
3K Followers 3K Following Supporting exceptional researchers and engineers, from academia to industry and beyond.
Dejia Xu @Ir1dXD
526 Followers 2K Following ✨ Research Lead @LumaLabsAI. https://t.co/bCoyXJpKDU @WNCG_UT @VITAGroupUT
Cole Yoos @Soldier4him2847
0 Followers 389 Following
josepha.mayo @josepha_mayo
204 Followers 104 Following 17 | ai/ml guy | solving any problem i can in my field https://t.co/sIoNiUA7dK || https://t.co/Dlm7ppuoLe
Gabe Greenberg @gabegreenberg
12K Followers 7K Following Jesus follower. Founder @ G2i. Focused on human data, RL environments. Co-organizer of AI Engineer Miami (@aiemiami) and React Miami (@reactmiamiconf).
PanteraOS @opanteraos
247 Followers 6K Following IA sem hype, eu testo antes de opinar. O que os labs globais lançam, filtrado e benchmarkado em PT-BR.
Marcel Ward @wardies
168 Followers 587 Following
Pascal Guay @pascalnet
2K Followers 4K Following Creates AI projects useful to the world. Loves to educate. Do only good everyday, be wholesome to people. Dogecoin tip jar: DL8XxKdUtz4qthNr6yi2NUbJoSJdSprZXr
Moon's Gravity @WagonWheelCraft
31 Followers 152 Following
JORDI SOLSONA @JORDITESLA
292 Followers 852 Following Human in the AI age | Inversión, Tesla, tecnología y libertad YT https://t.co/VkjmNkxxaU
Desmond Shum @DesmondShum
41K Followers 178 Following Author of Red Roulette. Ex China real estate developer, ex CPPCC member, ex CFA, ex Beijinger … Current Hong Konger, current dad, current home chef, current…
Tianyi Zhang @mycharmspace
2K Followers 476 Following prev search post-training@xAI, Opinions are my own https://t.co/seqf7m8paH
λux @novasarc01
22K Followers 3K Following tensor shepherd in a non-euclidean pasture | grazing on cuda cores
Xuhui Zhou @nlpxuhui
2K Followers 799 Following PhD student @LTIatCMU. Previously, @openhandsdev, @allen_ai, @UWNLP, @Apple, @UCBerkeley; Social Intelligence in language +X.
Wenhao Chai @ CVPR 20... @wenhaocha1
4K Followers 2K Following PhD student @PrincetonCS. @GoogleDeepMind. opinions are my own.
Jeremy Cohen @deepcohen
6K Followers 992 Following Research fellow at Flatiron Institute, working on understanding optimization in deep learning. Previously: PhD in machine learning at Carnegie Mellon.
Jesse Hoogland @jesse_hoogland
2K Followers 2K Following Researcher and decel working on developmental interpretability. Executive Director @ Timaeus
Peter Hase @peterbhase
4K Followers 1K Following AI Institute Fellow at Schmidt Sciences. Postdoc at Stanford NLP Group. Previously: Anthropic, AI2, Google, Meta, UNC Chapel Hill
Jiaxin Wen @jiaxinwen22
6K Followers 194 Following research @berkeley_ai @anthropicai. prev @tsinghua_univ.
Michael Hu @michahu8
1K Followers 669 Following NLP, language models | PhD @NYU | continual learning @NVIDIA | views my own @NSF GRFP fellow | prev @microsoft, @princeton_nlp, @cocosci_lab.
Sapient Intelligence @Sapient_Int
5K Followers 2 Following Building efficient & powerful general intelligence through brain-inspired architecture
Chenyu (Monica) Wang @ChenyuW64562111
1K Followers 709 Following PhD @MIT_CSAIL | Prev @AIatMeta @genentech @Tsinghua_Uni
Arvindh Arun @arvindh__a
742 Followers 795 Following RS intern @SakanaAILabs | @ELLISforEurope @MPI_IS PhD student
Nikhil Chandak @nikhilchandak29
802 Followers 478 Following Interning @MistralAI PhD @ Max Planck Institute. Past @iiit_hyderabad @VectorInst. Interested in better evals, forecasting, and open-endedness.
Qasim Wani @qasim31wani
618 Followers 1K Following self-improvement & post-training @xAI what does the world need that it doesn’t know yet
bhavit sharma @avaitopiper
3K Followers 3K Following interested in applied ML, little theory ML, distributed systems, and PL theory. https://t.co/ACD8ccV5ec
Logan Graham @logangraham
20K Followers 8K Following Head of the Frontier Red Team @anthropicai. 🌎 Make things radically good.
Matt Bergland @mattbergland
2K Followers 3K Following field marketing & events @cognition prev @windsurf
Alex Shan @alexshander03
1K Followers 28 Following Improving agents from production data. Co-founder, CEO of @JudgmentLabs
Matthieu Schulz @matthieuschulz
862 Followers 3K Following @deviationcap, prev @cognition, @windsurf, @columbia
shyamal @shyamalanadkat
22K Followers 1K Following applied AI. prev @openai @dukeu. explorer in the age of research 🇮🇳
Chenning Li @chenningli1117
108 Followers 93 Following
devansh @devanshpandey
2K Followers 734 Following building aligned general learners. cofounder @si_pbc. follows do not imply endorsement.
Stephen Xie @stephenx_
749 Followers 908 Following working with actors and models @xai | views are my own
Ryan Kaufman @ryankaufman
585 Followers 314 Following Building computer use models @si_pbc. Formerly MoTS @xai, contractor @OpenAI and @AnthropicAI, on leave from @Harvard. Views are my own.
Griffin Li @griffinli_
545 Followers 836 Following founder @hackneyapp / @northwesternu prev @pebble @beeper
Honghua Zhang @HonghuaZhang2
783 Followers 221 Following prev agentic coding @xAI, Ph.D. from @UCLA StarAI lab
Subhash Ramesh @subby_tech
4K Followers 3K Following man in the marina, bringing toy story to life @heybondu @fdotinc
Matthew Gallagher @galligator
19K Followers 191 Following Building MEDVi and The Gallagher Foundation
snow @snowclipsed
6K Followers 1K Following cache-miss eliminator @arcee_ai views are my own. https://t.co/h8rzm8QyZc
SPEC @___4o____
7K Followers 156 Following







































