Maximilian Böther @MaxiBoether
making data loaders go brr | mts @datologyai | Ph.D. student @ETH_EN @SystemsGroupETH @anaklimovic | previous gigs at @HPI_DE @apple @google mboether.com Zurich, Switzerland Joined October 2012-
Tweets854
-
Followers397
-
Following1K
-
Likes3K
wild to me that people vibe-generate slides for conference talks they are ugly (for now). they are low info densiry (thanks rlhf) but worse, they don't represent your thoughts, so your presentation of them will be terrible, unless you put in a ton of work (so just write them!)
@HKydlicek At @datologyai we are currently building a new dataloader - currently in private beta. Just DMd you!
A key reason for Zürich's recent robotics and AI boom. There's just incredible talent hidden in ETH, EPFL, UZH and many other places!
DeepMind stayed in London because it is better for talent than Silicon Valley. "I saw London and the UK as having incredible talent from top universities like Cambridge, Oxford, Imperial and UCL. There is a deep heritage of scientific breakthroughs and world-class thinkers.
Poor one out for the homies
Models are typically specialized to new domains by finetuning on small, high-quality datasets. We find that repeating the same dataset 10–50× starting from pretraining leads to substantially better downstream performance, in some cases outperforming larger models. 🧵
1/ People often think better multilingual models must come at the cost of English performance. Not true. The constraint isn’t capacity, it’s data quality, and we can fix it. Today @datologyai shares ÜberWeb: a year of multilingual curation lessons, scaled to 20T+ tokens.
@xeophon @RicardoMonti9 @datologyai > You even got one in your name, how fun Until you start applying for a US visa/bank account/SSN, this is where the fun stops 😂 > ß/ẞ into the data as well As part of my PhD contract at ETH Zurich I had to agree to not use those anymore :( Swiss rules and whatnot 🇨🇭
@RicardoMonti9 @xeophon @datologyai Intern-driven data curation (TM) He keeps telling me we „look at the data“, „thats your job“
@RicardoMonti9 @xeophon @datologyai ricardo made me manually review each and every of them😩
@StasBekman @datologyai @josh_wills I sent over a mail to your Snowflake address, probably easier than a tweet. In any case, glad the OVERLORD ref was already helpful!
@leavittron Would this be the right time and place to re-negotiate my intern stipend?
@josh_wills @leavittron @datologyai @sjoshi804 @HaoliYin interns dont get tagged😤
I'll be at NeurIPS in San Diego this week! If you want to talk about data loading/mixing/curricula and what we are currently cooking at @datologyai and @SystemsGroupETH, hit me up!
@joemelko @josh_wills Totally agreed! There is a spectrum: if you do n-phase training and pre compute everything, no need for online mixing. But are your mixtures for each phase truly optimal? It requires a lot of iteration, and algorithms like ADO rly help here.
4/ Mixing + Curricula I want the dataloader equivalent of torch.optim.lr_scheduler so I can schedule data mixtures over time with the same flexibility that we have for learning rates. Mixtera, from the systems group at ETH, is the most interesting thing I’ve seen along these lines (and has the bonus for me of being built using our lord and savior @duckdb), and I’m excited to be hosting @MaxiBoether over at @datologyai to explore this space with us!
@josh_wills with lots of important thoughts on data loaders for foundation model training. We are cooking something up at @datologyai 👀 But a dish needs to be tailored to taste, so if you have any thoughts on data loaders, reach out! ✉️
1/ Really looking forward to #PytorchConf this week in SF-- I've spent the last couple of months at @datologyai immersed in the DataLoader ecosystem (especially for our VLM stack) and I have a few topics I would love to discuss with folks (DMs are open, say hi if you see me, etc.
matti @krstdt
10K Followers 1K Following generalsekretär @fdp brandenburg • harte arbeit, gegenwind, unbezahlt
Flo Hilpoltsteiner �... @Kaptain_spACE
2K Followers 1K Following 38, Ghettojunge, hochgradig humorvoller ehem Funktionsträger @fdpbay und @jungeliberale, meistens dreist und selten schlecht, WiWi, Reservist, Privat Acc
Roland Fink @Frei_Fink
3K Followers 2K Following Irgendwas mit Freiheit und Mehrzweckhallen. Dum spiro spero. C'est en faisant n'importe quoi qu'on devient n'importe qui. Habt euch lieb.
Benjamin Strasser @bstrasser
6K Followers 2K Following freiheitskämpfer 🗽 | stellv. vorsitzender @fdp_bw 💛 | anwalt | oberschwabe
Dr. Phil Hackemann @PhilHackemann
8K Followers 2K Following Politics & Entrepreneurship | MSc @LSEEI | PhD @LMU @UniofOxford @UCBerkeley
Ren @Ren_aremb
178 Followers 1K Following Investing into the Al buildout ⚡️ Head of Al | 10 years as a Product Manager Subscribe for trade ideas / build conviction NFA just fun and vibes.
Matthew Dow Teems @amelia_cuec1p
27 Followers 420 Following Senior Vice President and Institutional Consultant at Graystone Consulting from Morgan Stanley. For more information please visit my website.
Tobias Ziegler @Tobias__Ziegler
1K Followers 263 Following Optimizing Distributed (Database) Systems @TigerBeetleDB — the financial transactions database designed to power the next 30 years of transaction processing.
Mahamadi nikiema @MahamadiN
17 Followers 2K Following
Hynek Kydlíček @HKydlicek
2K Followers 508 Following Building something new around data 👀 Prague, CZ 🇪🇺 eu/acc
. @miniitrades
255 Followers 3K Following Small Cap Day Trader I Squeeze Short Sellers With My "Wick Fill" Strategy Founder of @_marketmasters | Live trading EVERY morning👇
Keunhong Park @KeunhongP
2K Followers 658 Following Training models at World Labs. (https://t.co/a81eDVLlXF). Creator of FrameBoy (RTFM). Opinions are my own.
Robert Scoble @Scobleizer
586K Followers 50K Following San Francisco/Silicon Valley AI | Robots, holodecks, BCIs, analysis of new things | Ex-Microsoft, Rackspace, Fast Company | Wrote eight books about the future.
Yang Tran @YTR4N_
643 Followers 883 Following Partner @ZeroPrimeVC | Venture Partner @Speedinvest | AI, Data & Infra | SF 🌁
Ivan M @med_1v
1 Followers 3K Following
Aldo Gael Carranza @agcrnz
81 Followers 610 Following MTS @datologyai | PhD @Stanford | BS @UTAustin
Abteilung Freiheit @Abt_Freiheit
10 Followers 148 Following "Die Freiheit des Menschen liegt nicht darin, daß er tun kann, was er will, sondern darin, daß er nicht tun muß, was er nicht will." - Jean-Jacques Rousseau
Allan Zhang @allanzhangML
57 Followers 285 Following sophmore + ml research @ucla, former research intern @datologyai helix + vim enthusiast
Dongyang Fan @dyfan22
244 Followers 416 Following making LLMs efficient and responsible | PhD student in ML/LLMs @epfl_en 🇨🇭🏔️
Anshuman Suri @iamgroot42
683 Followers 878 Following Research @datologyai | Previously Postdoc @KhouryCollege, Ph.D. @UVA | Interested in data quality x security & privacy.
Chris Offner @chrisoffner3d
4K Followers 3K Following 3D computer vision, spatial AI, and synthetic data @Google XR. visual computing, machine learning, robot perception
Konstantin Dobler @konstantdobler
301 Followers 427 Following ELLIS PhD student @hpi_de, prev intern @apple @instadeepai @sap | Multilingual LLMs, tokenization, embeddings
Alex Mackenzie @alex__mackenzie
5K Followers 7K Following Partner at @GeneralCatalyst; code at https://t.co/mm13KZtoy7
Amro Abbas @amrokamal1997
530 Followers 1K Following I do AI Research @datologyai. Ex-AI Resident at Facebook (FAIR) | AMMI @AIMS_Next alumni | U of Khartoum alumni | Sudanese 🇸🇩
Zhengyang Qi @qi_zhengyang
806 Followers 5K Following Research @SnorkelAI | Previously: @Scale_AI Multiturn RL, reward modeling, interactive environments, artificial social intelligence
Freeman Lewin @Freeman_Lewin
763 Followers 1K Following Brick layer behind @TryBrickroad Building the future of super fast data procurement.
elie @eliebakouch
17K Followers 4K Following training llm @PrimeIntellect (prev: @huggingface) anon feedback: https://t.co/JmMh7Sg3mL
Thao Nguyen @thao_nguyen26
1K Followers 319 Following Pretraining data @AnthropicAI. Previously PhD student @uwcse, visiting researcher @AIatMeta, @GoogleAI Resident, @Stanford'19.
Saurabh Shah @saurabh_shah2
4K Followers 2K Following human-ing & AI-ing @humansand prev @allen_ai @Apple @Penn 🎤dabbler of things🎸 🐈⬛enjoyer of cats 🐈 and mountains🏔️he/him
Vineeth @VineethDorna
166 Followers 701 Following MTS @ DatologyAI | MS @ UMass Amherst | BTech @ IIT Bombay
Mushroom 🍄🟫 @wandering_mush
728 Followers 3K Following One life, Its worth an attempt. Professional retard. Industrialist wannabe
Johannes Hagemann @johannes_hage
10K Followers 3K Following co-founder/cto @PrimeIntellect | open superintelligence infra, longevity, techno-optimism
Kaleigh Mentzer @KaleighMentzer
131 Followers 325 Following MTS @ Datology | @ICMEStanford PhD | @dartmouth
Rahul Chhabra @rahulchhabra07
8K Followers 8K Following ceo @sabi. make something wonderful. taste is the bottleneck.
Angelika Romanou @agromanou
410 Followers 562 Following PhD candidate at @ICepfl | Research scientist intern at @AIatMeta doing research in #NLProc 👩🏻💻 https://t.co/eV1thvGQX1
Neha Hulkund @NHulkund
348 Followers 507 Following PhD Student at @MIT_CSAIL working on data-centric AI
Guilherme Penedo @gui_penedo
4K Followers 2K Following Pre-training data @huggingface 🤗. Lisboeta 🇵🇹
Matteo Farina @farinamatteoo
167 Followers 205 Following Research Intern @Apple MLR | 🇪🇺 @ELLISforEurope PhD, University of Trento (🇮🇹 @UniTrento) & University of Tübingen (🇩🇪 @uni_tue)
Adhiraj Ghosh✈️CV... @adhiraj_ghosh98
312 Followers 437 Following @MPI_IS + @ELLISforEurope PhD Student @bethgelab | mutimodal data curation, pretraining and evals 🦋: https://t.co/Q03vvJGgF4
N @Poyonoz
469 Followers 3K Following
Ari Morcos @arimorcos
7K Followers 2K Following CEO and Co-founder @datologyai working to make it easy for anyone to make the most of their data. Former: RS @AIatMeta (FAIR), RS @DeepMind, PhD @PiN_Harvard.
Sarah Catanzaro @sarahcat21
16K Followers 2K Following “All methods are sacred if they are internally necessary” (GP @amplifypartners, prev @canvasvc; Head of Data @Mattermark; @palantirtech; @c4ads)
Mimansa Jaiswal @MimansaJ
5K Followers 6K Following Currently RS @aiatmeta | LLMs/SLMs Post Training | Data, Evals, Rewards and Agentic System Orchestration https://t.co/Nv4aMHzhTC
Loubna Ben Allal @LoubnaBenAllal1
9K Followers 854 Following
K L 🌺💮🌼 @oagyoe16766
1 Followers 133 Following
Christian Lindner @c_lindner
752K Followers 1K Following Früher Politiker (Bundesminister a.D.), heute Unternehmer. Die Liebe zur Freiheit bleibt. 🗽 Posts von mir (CL) und Team (TL)
Franziska Brandmann @fbrandmann
13K Followers 2K Following
Sebastian Czaja @SebCzaja
7K Followers 466 Following 🗽Bundesvorstand @fdp 🚀Stellvertretender Vorsitzender @fdp_Berlin 💛 Berlin
Marco Buschmann @MarcoBuschmann
118K Followers 3K Following Rechtsanwalt | Bundesminister der Justiz a.D.
matti @krstdt
10K Followers 1K Following generalsekretär @fdp brandenburg • harte arbeit, gegenwind, unbezahlt
FDP @fdp
393K Followers 525 Following Wo Freiheit ist, ist alles möglich. Impressum & Datenschutz: https://t.co/mcujbmb9Cx
James Zabel 📸 @James_Zabel
5K Followers 2K Following Photo- and Videographer 📸 • Catdaddy 🐈 | DM for Shootings • at home on Instagram 📱
OttO Fricke @Otto_Fricke
16K Followers 413 Following Vorstandsvorsitzender @dosb , Rechtsanwalt, 🇳🇱- und Freiheitsfreund, Mitglied Bundesvorstand FDP
Ulf Poschardt @ulfposh
187K Followers 945 Following journalist - "Vergnügtsein heißt Einverstandensein." Horkheimer & Adorno 📖 SHITBÜRGERTUM ab jetzt in allen buchläden
Marie-Agnes Strack-Zi... @MAStrackZi
201K Followers 3K Following MdEP | Mitglied im FDP-Präsidium | Vorsitzende @fdp im @Europarl_DE | Vorsitzende Verteidigungsauschuss @EP_Defence | @EP_ForeignAff & @EP_Industry | Bikerin
Christian Dürr @christianduerr
22K Followers 1K Following
Moritz Körner @moritzkoerner
24K Followers 3K Following Mitglied des Europäischen Parlaments • Generalsekretär @fdp_nrw • Haushalt 💶 Rechtsstaat 🗽 Bürgerrechte 👮♂️
Lea Xenia @tatentschluss
16K Followers 298 Following From Berlin to London, in pursuit of a second law degree.
tagesschau @tagesschau
5.2M Followers 268 Following Schlagzeilen von https://t.co/e4ZbhtdUqY - 🐘 https://t.co/fjId0qMfhq
Rudi Bachmann @BachmannRudi
49K Followers 696 Following Economist (Professor at University of Michigan, not speaking for it in any way), Literal Transatlanticist, Posts about Economics, Politics, Policy and Academia
Jan Schnellenbach @schnellenbachj
34K Followers 3K Following Economics prof @BTU_CS. All opinions are strictly my own. Political Econ, Public Finance, Behavioral Econ, Federalism.
Michael Bröcker 💎 @MichaelBroecker
52K Followers 4K Following Chefredakteur @tableBriefings #tischredakteur Formerly @ThePioneerDE @rponline - Hauptstädter mit rheinischem Antlitz. 🎙️Table Today, private views #effzeh
Ties Robroek @SGui
6 Followers 5 Following
Dumpelstiltskin @0xMentalIllness
2K Followers 406 Following In the machine economy I don’t know anything, but I trust my taste.
Ren @ren_stocks
36K Followers 90 Following Investing into the AI buildout ⚡️ Head of AI | Product Manager 10+ years DD and full thesis in https://t.co/B9oMWxJ4m6 NFA
The OpenAI Foundation @FoundationOAI
7K Followers 0 Following OpenAI was founded in 2015 as a nonprofit; its mission is to ensure artificial general intelligence benefits all of humanity.
Ivan Kirigin @ikirigin
9K Followers 1K Following Investing in AI, ML, automation, and robotics. Building @attention__ai. No tech: @tekno_Ivan
Quentin de Laroussilh... @Underflow404
271 Followers 173 Following Technical Lead Manager, Google ML - I teach machines how to learn. Interested in artificial intelligence, ML and data science.
Tobias Ziegler @Tobias__Ziegler
1K Followers 263 Following Optimizing Distributed (Database) Systems @TigerBeetleDB — the financial transactions database designed to power the next 30 years of transaction processing.
Erik Bernhardsson @bernhardsson
55K Followers 4K Following Building everyone's favorite AI infrastructure platform @modal
Noumena @NoumenaAI
546 Followers 0 Following
Alex Gartrell @alexgartrell
334 Followers 59 Following Head of Engineering at @thinkymachines, Previously, server operating systems, efficiency, and other low-level stuff @meta
The Assembly @InTheAssembly
480K Followers 1 Following Macro analysis, market structure, and the trades nobody else is showing you.
Michael Burry Stock T... @burrytracker
513K Followers 143 Following Tracking hedge funds and Burry’s stocks. Powered by @joinautopilot
Value Investigator @value_invest12
34K Followers 420 Following I invest in great companies with long growth runways trading at reasonable prices.
Jeff Pu @sssjeffpu
29K Followers 46 Following Tech Enthusiast. 20 years tech equity research + industry.
Amro Abbas @amrokamal1997
530 Followers 1K Following I do AI Research @datologyai. Ex-AI Resident at Facebook (FAIR) | AMMI @AIMS_Next alumni | U of Khartoum alumni | Sudanese 🇸🇩
Benedict Kerres @benedictk__
4K Followers 2K Following @OpenAI | ex-@PalantirTech; ex-@AppliedInt | @uniheidelberg | Physicist | Autonomous systems | Personal Views Only.
Data Foundations of A... @DFAI_Community
83 Followers 31 Following This is the official X account for the Data Foundations of AI Community.
KawzInvests 🦑 @KawzInvests
112K Followers 530 Following Research-focused. Photonics. AI. Defense. Tech. Space. Optic Supercycle. Not financial advice
M. T @maritimetank
216 Followers 224 Following 99th percentile life enjoyer | generalist engineer @aws 🇨🇦
7 @quantLR
2K Followers 580 Following Fear causes hesitation, and hesitation will cause your worst fears to come true. I don’t trade for direction, I trade for distribution.
xilo @Xilo_K
2K Followers 1K Following That we are capable only of being what we are remains our unforgivable sin
Keunhong Park @KeunhongP
2K Followers 658 Following Training models at World Labs. (https://t.co/a81eDVLlXF). Creator of FrameBoy (RTFM). Opinions are my own.
Chase Roberts @chsrbrts
2K Followers 383 Following AI things at @a16z. Prefers 🌶️ takes. Makes noises about b2b sales/GTM/ops. Prev @northflank @vertexvus @segment @box @berkeleyhaas 🏎️🚴🎾
Enrique Ruiz Durazo @ruizdurazo
456 Followers 957 Following Software Engineer & Designer ⬢ @cursor_ai Ambassador Zurich
alex peysakhovich @alex_peys
6K Followers 808 Following partner @shv - interested in ai for biology, also dogs, motorsports, multi-agent systems, and rl
Saurabh Sharma @zsparta
9K Followers 1K Following Chief Investment Officer @jump_ Compiler Engineer, Quant Trader, Theatre Buff, Cornell CS grad
Pokémon Music 🎵 @PokemonOST
261K Followers 40 Following If you really wanna chill out and remember the most beautiful moments of your life, follow this ✨DM For soundtrack requests ✉️ fan acc**
Jonathan Userovici @JonUsero
4K Followers 169 Following General Partner at global VC firm @HeadlineVC - helping founders win, bigger 🌎
dagmar rosenfeld @rosidaggi
33K Followers 722 Following
Claudius Seidl @Claudiusseidl
13K Followers 2K Following "Eine Meinung kann jeder haben. Ein guter Mann kann zwei bis drei Meinungen haben. Einer ist keiner, und über weniger als 200 brauchen wir nicht zu sprechen"
Vincent Weisser @vincentweisser
29K Followers 6K Following ceo @primeintellect — building self improving agents & infra
Guy Verhofstadt @guyverhofstadt
496K Followers 2K Following President of @EMInternational. #IAmEuropean🇪🇺
Dongyang Fan @dyfan22
244 Followers 416 Following making LLMs efficient and responsible | PhD student in ML/LLMs @epfl_en 🇨🇭🏔️
Navid Pour @navidkpr
996 Followers 410 Following Building @ProximalHQ | prev founding eng / research @Cursor_AI
Justus Mattern @MatternJustus
8K Followers 839 Following Co-Founder @ProximalHQ | prev. research @PrimeIntellect, @MPI_IS and built revideo



































