malteos @XYOU
Berlin, Germany Joined June 2009-
Tweets241
-
Followers750
-
Following2K
-
Likes2K
@RishiBommasani @percyliang The analogy for cloud vs local would be restaurant vs takeout. At the restaurant you better behave otherwise you get kicked out. At home you eat your food however you want.
@MatthewBerman Sure about this? Given the current reproducibility crisis in ML research, I doubt that humans would achieve a much higher replication score.
4/ In academia, the work is very different. PhD students or even undergraduates are the ones doing most the actual research work. But as a PhD student, you need to decide whether you prioritize the project work over your own PhD work (papers and thesis).
3/ LLMs and other foundation models are no longer research artifacts but products. Frontier models are developed by dedicated teams of +100 people specialized across the whole stack (from low level hardware optimization over data to ML and UX topics).
@hu_yifei Did you already try Grobid? github.com/kermitt2/grobid
@gui_penedo @pjox13 That’s even better. I will share the data with you as soon it’s ready!
@gui_penedo @pjox13 We will release a filtered version of Colossal OSCAR soon. Is your training and evaluation script somewhere available? I would love to do the comparison with that version.
@mark_cummins For Germany, we have ~50B tokens of court decisions but that are only the publicly available ones and that represent ~1% of all court decisions. However, you won't need all for LLM training due to high duplicate ratio. @mlissner might have the US numbers.
@gui_penedo Awesome work. Will the remaining models also be released? And from your experience what model and data size do you need to see a significant difference in performance?
@saattrupdan @SebastianB929 @occiglot Do you have the whole eval setup in containers? If so, I could help with compute.
@SebastianB929 @occiglot Pinging @saattrupdan who did the evals.
@qinzytech @OpenAI @Meta Great work! Will the pretraining code be open source?
@BramVanroy @VSC_HPC If your cluster uses slurm you can catch the kill signal and save a checkpoint before that. See this script for an example. Line 14 and 293-300 do the magic. gist.github.com/malteos/71635c…
@SebastianB929 Opengptx is an official government funded research project. Occiglot is a loose group of individuals from different organizations without any formal ties. We call it a research collective. You may also call it simply a discord server. And yes, the website needs to be improved.
@BramVanroy Have you tried tensor parallelism on the embedding layer? If I remember it correctly Bloom used this with its large vocab. @StasBekman
@BramVanroy @ph_singer There is a high correlation between the weights of Mistral and Mixtral. So this seems pretty likely.
@robertomasymas @burkov Check out "progressive growing". People did something similar already with BERT models.
Jan Philip Wahle @jpwahle
648 Followers 371 Following 📓 https://t.co/ChrtD5bUE3 📸 https://t.co/gDxyIvJkSr 📅 https://t.co/TcCv8cgTnC (@ConfDeadlinesAI) 🤖 https://t.co/UcMzKmobj1
NLLP Workshop @NllpWorkshop
1K Followers 534 Following 8th Workshop on Natural Legal Language Processing (NLLP) @EMNLP2026 #nlproc #law #legaltech #nllp Cfp: https://t.co/BxzS7Hwu4u
Gavindya @Gavindya2
325 Followers 345 Following Postdoc @IXlab_UT @UTiSchool @UTAustin | PhD @ODUCS @ODU, Ex-RA @NirdsLab @WebSciDL, Summer Research Intern @LosAlamosNatLab, BS (CSE) @MoratuwaUni
Michael L. Nelson @phonedude_mln
2K Followers 955 Following Professor: @WebSciDL @ODUcs @ODUVMASC @ODUDataScience (2002-now); Engineer: @NASA_Langley (1991-2002); Postdoc: @UNCSILS (2000-2001)
Isabelle Augenstein @IAugenstein
12K Followers 1K Following Full Professor @CopeNLU @uni_copenhagen. Formerly @ucl_nlp, @SheffieldNLP. Explainable AI, Natural Language Processing, ML.
Gabriele Sarti @gsarti_
2K Followers 2K Following Open-source interpretability to seize the means of prediction. Postdoc w/ @davidbau @ndif_team @Northeastern. Prev: @GroNLP, @amazonscience
Travis Reid @TReid803
238 Followers 391 Following PhD Student at @oducs; Member of @WebSciDL; M.S. & B.S. in Computer Science from @ODU; A.S. in Computer Science from @TCCva
Pei Zhou @peizNLP
3K Followers 909 Following Senior Applied Scientist @Microsoft #OAR | PhD @nlp_usc | X-@GoogleDeepMind @allen_ai @AmazonScience @UCLA | Common Ground Reasoning for Communicative Agents
Yasith Jayawardana @yasithdev
403 Followers 478 Following Cofounder @marketrix_ai, Researcher @georgiatech
Himarsha R. Jayanetti @HimarshaJ
541 Followers 843 Following PhD Candidate (Computer Science @oducs) Web Science & Digital Libraries Research Group @WebSciDL, ODU @odu. 🐘: @[email protected]
Kritika garg @kritika_garg
369 Followers 774 Following Ph.D. Student, Department of Computer Science @ODUCS @ODU, Member of Web Science & Digital Libraries Research Group @WebSciDL.
Ahmad Taie @ahtaie
200 Followers 592 Following Founder @tenwattsai On a journey to be maximally caring, and minimally bothered.
DeepSec @deepsec711
1 Followers 114 Following
Common Crawl Foundati... @CommonCrawl
8K Followers 2K Following Common Crawl is a non-profit foundation dedicated to the Open Web.
Paisley @Weetit0918
2 Followers 130 Following 🚀 decoding earnings reports lover, market dreamer! excited for stock picks. DM me about bull markets! 💬 #Stocks #Trends
テクニカル指標... @Aubeebit4451
58 Followers 2K Following 【完全無料】 25年の株式投資プロチーム(運用資産500億円以上)が提供:毎日の市場分析レポート + 優良成長株のピックアップ。プロの情報を無料で。まずはお気軽にお問い合わせください。
TessFast @jZQwkGepTlb19
65 Followers 2K Following
WandaHope @2jFjz4pPcb2pgT
73 Followers 2K Following
Ydealev @Ydealev2211
27 Followers 2K Following
Bruibauc @Bruibauc030936
26 Followers 2K Following
Thurnest @Thurnesta_CriE
45 Followers 1K Following
(-o-) @JonasUB
0 Followers 31 Following
Shoatoth @ShoatothQBQqEQ
50 Followers 4K Following
Gabrielle @TateanSV
4 Followers 547 Following
Yassine El Kheir @YassineElkheir
62 Followers 600 Following PhD Student at DFKI & Technical University of Berlin
Plawneau @PlawneautgN
3 Followers 96 Following
Lisa @SorkoughWVRyT
5 Followers 538 Following
JIE GAO @jerryGaoDextrys
226 Followers 1K Following Researcher in NLP/text analysis, semantic/content technology, misinformation/disinformation; retweets are bookmarks for myself; Husband, father; reasonable cook
Louise @HO759Z7zcA0V0m
73 Followers 7K Following
Felix Müller @fmueller_bln
1K Followers 617 Following Software Developer • Obsessed with complex systems and systems thinking • Builds side projects • he/him
Glowin @glow1n
8K Followers 4K Following https://t.co/0sMugJUwwx | focusing on Generative AI |Former Co-founder of https://t.co/PJL8ze16fj with @kalasoo , acquired by ByteDance in 2019
Danial Namazifard @DanialNamazi_Fa
642 Followers 3K Following پژوهشگر علوم اعصاب شناختی و هوش مصنوعی
Georges Harik @gharik
8K Followers 4K Following humans& co-founder, 7th employee google, co-created adwords online, co-created adsense targeting, worked on ai, gmail, calendar, bought android.
Sunny Sanyal @SunnySanyal9
1K Followers 755 Following On Job market | PhD candidate @UTexasECE| Prev Intern @GoogleDeepMind (FAI), @LightningAI & @AmazonScience (Alexa)
Konstantin Dobler @konstantdobler
301 Followers 427 Following ELLIS PhD student @hpi_de, prev intern @apple @instadeepai @sap | Multilingual LLMs, tokenization, embeddings
François REMY @FremyCompany
801 Followers 457 Following Natural Language Processing for Healthcare, Web Platform, and European Politics. 🖤💛❤ NLP, AI, Open Web, W3C, HTML, CSS, SVG, JavaScript, Python. Views my own.
Shivanand 🇮🇳 @ssheshap
253 Followers 2K Following भारतीय || ಕನ್ನಡಿಗ || RTs/likes/Links are not endorsement.
Manuel Vargas A. @AIdeaText
19 Followers 93 Following
Horacio @horacio123
965 Followers 4K Following La solución no está en la grada, es hora de bajar al campo y marcar goles.
Islam Mesabah 🇵�... @islam_mesabah
101 Followers 855 Following PhD student | Researcher & Lecturer 👨💻 @ DFKI & RPTU
Raphael Schlattmann @RaphSchlatt
4 Followers 62 Following
Everlyn Asiko @everlyn_asiko
857 Followers 545 Following PhD Fellow(ADTP-DS) @QL_Africa | Machine Translation researcher | @AIMSacza Graduate | Ex-DS Technical Mentor @moringaschool | KamiLimu Cohort 4.0 mentee
Shrurshy @ShrurshyTHa
18 Followers 655 Following
Lersoyez @Lersoyez2q5t
7 Followers 319 Following
Hynek Kydlíček @HKydlicek
2K Followers 508 Following Building something new around data 👀 Prague, CZ 🇪🇺 eu/acc
Guilherme Penedo @gui_penedo
4K Followers 2K Following Pre-training data @huggingface 🤗. Lisboeta 🇵🇹
Martin Courtois @MCourtois173
9 Followers 16 Following
SharonPatrick @5xgL3BC2T9t1I
57 Followers 7K Following
Yann LeCun @ylecun
1.2M Followers 788 Following Professor at NYU & Executive Chairman at AMI Labs. Ex-Chief AI Scientist at Meta. Researcher in AI, Machine Learning, Robotics, etc. ACM Turing Award Laureate.
(((ل()(ل() 'yoav)))... @yoavgo
82K Followers 2K Following
Nils Reimers @Nils_Reimers
15K Followers 538 Following VP AI Search @Cohere | ex-huggingface | Creator of SBERT (https://t.co/MKKOMfuQ4C)
AK @_akhaliq
504K Followers 3K Following AI research paper tweets, ML @Gradio (acq. by @HuggingFace 🤗) dm for promo ,submit papers here: https://t.co/UzmYN5XOCi
EMNLP 2026 @emnlpmeeting
17K Followers 53 Following EMNLP 2026 - The 2026 Conference on Empirical Methods in Natural Language Processing Hashtag: #EMNLP2026 Dates: October 24 –29 Submission: ACL ARR March and May
Sebastian Ruder @seb_ruder
99K Followers 1K Following Research Scientist @AIatMeta MSL • Ex @Cohere @GoogleDeepMind
ACL 2026 @aclmeeting
23K Followers 48 Following Association for Computational Linguistics | ACL 2026 conference | The 64th Annual Meeting of the ACL Hashtags: #NLProc #ACL2026NLP
Leon Derczynski ⚒�... @LeonDerczynski
6K Followers 1K Following NLP/ML/language/security. Principal research scientist @NVIDIA, & Prof @ITUkbh. Views ostensibly professional. llmsec stan acct
Percy Liang @percyliang
106K Followers 426 Following professor of computer science @Stanford @stanfordnlp, co-founder of @togethercompute, creator of https://t.co/7R5THVogW2, co-founder of @simile_ai, pianist
Delip Rao e/σ @deliprao
69K Followers 5K Following Busy inventing the shipwreck. @Penn. Past: @johnshopkins, @UCSC, @Amazon, @Twitter ||Art: #NLProc, Vision, Speech, #DeepLearning || Life: 道元, improv, running 🌈
Sam Bowman @sleepinyourhat
65K Followers 3K Following AI alignment + LLMs at Anthropic. On leave from NYU. Views not employers'. No relation to @s8mb. Into @givingwhatwecan.
Jan Philip Wahle @jpwahle
648 Followers 371 Following 📓 https://t.co/ChrtD5bUE3 📸 https://t.co/gDxyIvJkSr 📅 https://t.co/TcCv8cgTnC (@ConfDeadlinesAI) 🤖 https://t.co/UcMzKmobj1
William Wang @WilliamWangNLP
21K Followers 769 Following CEO & Founder, @AlphaDesignAI. We make https://t.co/1LfDYicsF2 I'm also Mellichamp Chair Prof. at UCSB CS. PhD @ CMU SCS.
NLLP Workshop @NllpWorkshop
1K Followers 534 Following 8th Workshop on Natural Legal Language Processing (NLLP) @EMNLP2026 #nlproc #law #legaltech #nllp Cfp: https://t.co/BxzS7Hwu4u
Gavindya @Gavindya2
325 Followers 345 Following Postdoc @IXlab_UT @UTiSchool @UTAustin | PhD @ODUCS @ODU, Ex-RA @NirdsLab @WebSciDL, Summer Research Intern @LosAlamosNatLab, BS (CSE) @MoratuwaUni
Sebastian Gehrmann @sebgehr
6K Followers 2K Following Making AI trustworthy as Head of Responsible AI in the CTOs office @Bloomberg. Formerly LLMs @ Google Brain / PhD @ Harvard. views my own
Luca Soldaini 🎀 @soldni
13K Followers 1K Following data mines are my passion ⛏️ mts @MicrosoftAI / ex co-lead Olmo @allen_ai / pfp @YanhongLi2062 / thoughts are mine, leave my employer alone / 🌈
Jenna Russell @jennajrussell
655 Followers 431 Following CS PhD Student @umdcs @ClipUmd, interning @pangramlabs, undergrad @CornellCIS
Tatsunori Hashimoto @tatsu_hashimoto
11K Followers 200 Following Assistant Prof at Stanford CS, member of @stanfordnlp and statsml groups; Formerly at Microsoft / postdoc at Stanford CS / Stats.
Will Held @WilliamBarrHeld
3K Followers 1K Following Open LLM Training @ https://t.co/yb9OySgHFM Formerly ML PhD w/ @Diyi_Yang, 🦙 @AIatMeta, Assistant @GoogleAI, اللغة العربية @NYUAbuDhabi Burqueño
Antoine Chaffin @antoine_chaffin
3K Followers 738 Following Solve search, solve everything @LightOnIO
Ben Clavié @bclavie
7K Followers 1K Following regressing linearly on a daily basis. wife guy who does retrieval. research @mixedbreadai, prev answerdotai
Florian Brand @xeophon
14K Followers 733 Following evals @PrimeIntellect | open models @interconnectsai
Patrick Loeber @patloeber
73K Followers 1K Following member of technical staff @GoogleDeepMind • gemini api & ai studio • my views
Sebastian Borgeaud @borgeaud_s
3K Followers 272 Following Research Engineer @GoogleDeepMind Lead for Gemini pre-training
Negar Foroutan @negarforoutan
766 Followers 913 Following Research Scientist, Google Research. Prev: #NLProc PhD student @EPFL_en, Multilingual NLP
Pangram @pangram
11K Followers 19 Following Keeping the world free of AI slop. This account has automated replies: Tag @pangram with 'ai?' to get an AI check on any post.
Common Crawl Foundati... @CommonCrawl
8K Followers 2K Following Common Crawl is a non-profit foundation dedicated to the Open Web.
Workshop on Multiling... @wmdqs
13 Followers 14 Following The first iteration of our workshop will be co-located with @COLM_conf 2025 in Montreal.
Hynek Kydlíček @HKydlicek
2K Followers 508 Following Building something new around data 👀 Prague, CZ 🇪🇺 eu/acc
Jonathan Frankle @jefrankle
23K Followers 800 Following Chief AI Scientist @databricks via MosaicML. e/brick
Justin Quan @justoutquan
2K Followers 2K Following software should be fun! we're hiring @tomo dms open
tomaarsen @tomaarsen
4K Followers 435 Following Sentence Transformers, SetFit & NLTK maintainer Machine Learning Engineer at 🤗 Hugging Face
Mahesh Sathiamoorthy @madiator
15K Followers 1K Following RL Environment Curation. Data Curation (OpenThoughts). Post-training. CEO @bespokelabsai. Ex-GoogleDeepMind.
Ryan Marten @ryanmart3n
2K Followers 2K Following Building @harborframework and @terminalbench with @alexgshaw
David @DavidSHolz
102K Followers 10K Following founder @midjourney, previously founded leap motion, before that was at nasa and max planck - vibeposting @davidvibesonly
Paul Michel @pmichelX
1K Followers 51 Following Staff Research Scientist @DeepMind working on Gemini pretraining. Previously postdoc @ENS_ULM and PhD @LTIatCMU
Hensen Juang @basedjensen
16K Followers 2K Following ctoing at a soonicorn, ex cluster janitor at a frontier lab. current chief clanker at clanker cloud
Dmitri Alperovitch @DAlperovitch
203K Followers 2K Following Geopolitics/NatSec, Russia, China, Cyber. Chairman @SilveradoPolicy; Author WorldOnTheBrink; Host @GeopolDecanted; Founder @alperovitch; Co-Founder @CrowdStrike
Georges Harik @gharik
8K Followers 4K Following humans& co-founder, 7th employee google, co-created adwords online, co-created adsense targeting, worked on ai, gmail, calendar, bought android.
Conference on Languag... @COLM_conf
7K Followers 7 Following https://t.co/GhGCMEoHU8 Conference: October 7, 2025
LLM360 @llm360
3K Followers 76 Following LLM360 is an open research lab enabling community-owned AGI through open-source large model research and development.
Sunny Sanyal @SunnySanyal9
1K Followers 755 Following On Job market | PhD candidate @UTexasECE| Prev Intern @GoogleDeepMind (FAI), @LightningAI & @AmazonScience (Alexa)
Siva Reddy @sivareddyg
8K Followers 1K Following Assistant Professor @Mila_Quebec @McGillU @ServiceNowRSRCH; Postdoc @StanfordNLP; PhD @EdinburghNLP; Natural Language Processor #NLProc
François REMY @FremyCompany
801 Followers 457 Following Natural Language Processing for Healthcare, Web Platform, and European Politics. 🖤💛❤ NLP, AI, Open Web, W3C, HTML, CSS, SVG, JavaScript, Python. Views my own.
Jean Mercat @MercatJean
284 Followers 231 Following
Achal Dave @achalddave
390 Followers 264 Following
Zhen Wang @zhenwang9102
1K Followers 328 Following Moore Foundation Fellow @UCSanDiego🌴 | Reasoning & Open-Endedness Discovery | Language, Agent, World Models (LAW) | Prev. @osunlp @MSFTResearch @MITIBMLab
Jiayi Pan @jiayi_pirate
14K Followers 2K Following Research | Prev @xAI @Berkeley_AI | Views Are My Own
Hongming Zhang @hongming110
1K Followers 228 Following Research Scientist, FAIR. Working on self-evolving AI.
Łukasz Kondraciuk @lukasz_kondr
2K Followers 54 Following Strawberry training infra lead @ OpenAI prev: CS at University of Warsaw, ACM ICPC 2022 silver medalist
Tristan Thrush @TristanThrush
4K Followers 923 Following PhD-ing @StanfordAILab @stanfordnlp. Interested in data, multimodality, scaling, and many more things.
Perplexity @perplexity_ai
492K Followers 76 Following Curiosity changes everything. Download our free app on iOS, Mac, Windows, and Android.
Everlyn Asiko @everlyn_asiko
857 Followers 545 Following PhD Fellow(ADTP-DS) @QL_Africa | Machine Translation researcher | @AIMSacza Graduate | Ex-DS Technical Mentor @moringaschool | KamiLimu Cohort 4.0 mentee
Daniel Han @danielhanchen
33K Followers 2K Following Building @UnslothAI • Making open-source LLMs faster, better & more accessible • YC S24 • ex-NVIDIA ML
Yifei Hu @hu_yifei
5K Followers 674 Following Machine Learning Researcher @reductoai | Automating document workflows | Prev: PhD @LifeAtPurdue | Opinions my own























