main @main_horse

AGI Believer. Haven't applied @OpenAI. Likes are not always endorsement. main.horse Joined December 2022

Tweets

3K
Followers

8K
Following

457
Likes

27K

Guilherme Penedo @gui_penedo

a week ago

We have just released 🍷 FineWeb: 15 trillion tokens of high quality web data. We filtered and deduplicated all CommonCrawl between 2013 and 2024. Models trained on FineWeb outperform RefinedWeb, C4, DolmaV1.6, The Pile and SlimPajama!

38 332 1K 532K 728

Download Image

Dwarkesh Patel @dwarkesh_sp

2 weeks ago

Zuck on: - Llama 3 - open sourcing towards AGI - custom silicon, synthetic data, & energy constraints on scaling - Caeser Augustus, intelligence explosion, bioweapons, $10b models, & much more Enjoy! Links below

134 376 3K 840K 3K

Download Video

Daniel Han @danielhanchen

2 weeks ago

LLAMA-3 IS OUT! llama.meta.com/llama3/

15 79 388 36K 68

Download Image

cory @Cixelyn

2 weeks ago

what if u could watch anime AND design distributed systems at the SAME TIME we're drowning in gpus and need a giga cracked systems engineer asap dm @ok_ikaros w/ name of best waifu/husbando, along with ur hottest take: like how nomad mogs k8s, or how scylladb sucks tokyo/sf

24 21 224 20K 54

Download Image

Lilian Weng @lilianweng

2 weeks ago

🎨Spent some time refactoring the 2021 post on diffusion model with new content: lilianweng.github.io/posts/2021-07-… ⬇️ ⬇️ ⬇️ 🎬Then another short piece on diffusion video models: lilianweng.github.io/posts/2024-04-… (Yes, I had an intensive weekend🥹)

23 161 985 84K 509

main @main_horse

2 weeks ago

powerful energy in this note

Reka @RekaAILabs

2 weeks ago

powerful energy in this note https://t.co/2hdwC9ZlUJ

2 58 369 155K 277

Download Image

1 0 77 8K 2

Download Image

Omar Sanseviero @osanseviero

3 weeks ago

Proposal: with new MoEs, let's discuss less about the total number of experts, and instead focus on the two main things that we care about: - # of total params - # of default activated params Mixtral-8x7B -> Mixtral-47B-A12B Mixtral-8x22B -> Mixtral-141B-A35B

12 42 310 38K 69

Aran Komatsuzaki @arankomatsuzaki

3 weeks ago

Scaling Laws for Data Filtering -- Data Curation cannot be Compute Agnostic Argues that data curation cannot be agnostic of the total compute that a model will be trained for repo: github.com/locuslab/scali… abs: arxiv.org/abs/2404.07177

6 90 460 49K 312

Download Image

Zeyuan Allen-Zhu @ZeyuanAllenZhu

3 weeks ago

Our 12 scaling laws (for LLM knowledge capacity) are out: arxiv.org/abs/2404.05405. Took me 4mos to submit 50,000 jobs; took Meta 1mo for legal review; FAIR sponsored 4,200,000 GPU hrs. Hope this is a new direction to study scaling laws + help practitioners make informed decisions

27 334 1K 220K 1K

Download Image

Teortaxes▶️ @teortaxesTex

4 weeks ago

I do hope that BitNet [works at scale and] is broadly compatible But which sparsity methods are stackable between themselves, and when do they tap into the same redundancies? We'll see the attempt at ReLUfied Mixtral soon. And it seems that GQA already competes with ReLUfication.

kalomaze @kalomaze

4 weeks ago

0 0 3 4K 3

1 1 18 3K 11

Download Image

Nat Friedman @natfriedman

a month ago

@dwarkesh_sp @_sholtodouglas @LukeFarritor I'll pledge $25k in prize money to this if it becomes an actual contest

9 5 244 37K 41

main @main_horse

a month ago

some smaller targets worth taking iff too GPU poor to analyze mixtral: qwen1.5-moe (upcycled!) qwenlm.github.io/blog/qwen-moe/ deepseek-moe github.com/deepseek-ai/De… openmoe-8b github.com/XueFuzhao/Open… switch-base-8 huggingface.co/google/switch-…

Dwarkesh Patel @dwarkesh_sp

a month ago

19 42 491 172K 379

Download Video

3 5 74 9K 47

Dwarkesh Patel @dwarkesh_sp

a month ago

Had so much fun chatting with my friends @TrentonBricken and @_sholtodouglas. No way to summarize it, except: This is the best context dump out there on how LLMs are trained, what capabilities they're likely to soon have, and what exactly is going on inside them. You would be…

41 128 1K 377K 1K

Download Video

Cody Blakeney @code_star

a month ago

It’s finally here 🎉🥳 In case you missed us, MosaicML/ Databricks is back at it, with a new best in class open weight LLM named DBRX. An MoE with 132B total parameters and 32B active 32k context length and trained for 12T tokens 🤯

28 131 832 340K 287

Download Image

Nathan Odle @mov_axbx

a month ago

You have all been very patient, so here you go. 7x4090 Writeup. And don't you dare shame my frontend skills. mov-axbx.com/wopr/wopr_conc… Now go buy some GPUs and get to it.

78 110 1K 153K 624

Download Image

cory @Cixelyn

a month ago

ping us if you both identity with sen, and also aspire to train real foundational model instead of just wrapping gpt4/claude I think we comfortably have 10-100x more compute than anyone else in the anime industry

sen (e/λ) @senlog

a month ago

15 3 166 26K 18

11 7 138 22K 28

cryptoolddog @cryptoolddog88

6 Followers 335 Following Bullish Crypto

Trect @trectadactyl0

18 Followers 147 Following alt | AI founder | stats, ML & finance | ex. Jane skreet, ex google | milady world order

CrypticAbsolon @CrypticAbsolon

6 Followers 41 Following

Trevor Bosetti @Trevor_B11

314 Followers 306 Following Living Life

Ma Sheen @MaSheenUprising

8 Followers 998 Following “The programme will take me a little while to run.” Fook glanced impatiently at his watch.

Stefan Juang @StefanJuang

146 Followers 1K Following The final goal of AI is not just to create intelligent machines, but to understand intelligence itself.

Alessandro Scarcella @alesca

136 Followers 929 Following ML engineer. Free banking. I invoke my right to change my opinion

Nicholas Kross (pause.. @NicholasKross

189 Followers 776 Following Barreling (slowly ;-;) headfirst into AI alignment research. By day, software test engineer.

Robert Skopal @skopal_robert

3K Followers 6K Following Libertarian,free thinker.Spirituality,science,tech,investing,philosophy,history,politics.

KeepItRealness @KeepItRealness

55 Followers 792 Following

MMM @MMM1897775

9 Followers 806 Following

camhowe @camhowe1729

4 Followers 216 Following full-time techbro, part-time anon/undergrad. love explaining tech stuff.

Global Bagholding Opp.. @justbrosef

5K Followers 5K Following world’s first analyst-slash-therapist

Fernando Peña @ElBuenFercho

11 Followers 345 Following

Dan Grossman @dangrossman

301 Followers 2K Following Acquisitions and equity investments @ Amazon

Prize winning poet, lover of biology, the study of medicine, art, dance, cosmology, geology, anthropology, music & languages. Follow me!

Lee @LeeMarie222

2K Followers 6K Following Prize winning poet, lover of biology, the study of medicine, art, dance, cosmology, geology, anthropology, music & languages. Follow me!

wqhff @wqhff3

29 Followers 85 Following

Ahmed Hisham @AhmedHi08078280

0 Followers 50 Following

NekoKazama @kazama_do

122 Followers 1K Following

Nick Sarkauskas @NickSarkauskas

223 Followers 598 Following HPC software at NVIDIA

Joe of Long Beach @JoeofLongBeach

83 Followers 427 Following e/acc

Not all those who wander are lost. Passionate about all things travel, history, fitness, art, hiking, and technology.

Investor @ GBV

Anton Ermolov @TwanErms

98 Followers 317 Following Not all those who wander are lost. Passionate about all things travel, history, fitness, art, hiking, and technology. Investor @ GBV

Dana Mahmood @deordered

22 Followers 720 Following Fine-tuning AI models oftentimes & practicing philosopher at other times.

pamara andarion🧝�.. @thealienpam

151 Followers 636 Following alien pam 👽 | cyber-venetian emissary 💌 | my ufo wrapped in swarovski crystals girl 🪩🛸

Mahaoo @mahaoo_ASI

18 Followers 177 Following unhinged socially unacceptable takes about humanity and ASI

Xilo @PandoXiloscient

453 Followers 576 Following The future belongs to benevolent zaibatsu | Increasing global energy consumption | e/λ

jatin birajdar @foxtort

32 Followers 208 Following Writer, TVF.

truely @truelyeth

436 Followers 5K Following Developer(Solidity/JS/Python)

Ami yousef @Amiyousi

74 Followers 385 Following

Kyle @kylewillisx

269 Followers 2K Following HTML Rulez d00d..

max @m_a_x_p_n

1 Followers 94 Following

Dolph. @dolphinboywazzz

21 Followers 1K Following

Kevin @showkat_tweets

350 Followers 800 Following product manager, founder

123wazzino @xiji123

325 Followers 4K Following

autonomous noumena no.. @microsoft_worm

503 Followers 3K Following I am what I am • she/her • Discord: `parenthetically`

Sameer Ahmed @SameerA41115539

665 Followers 5K Following M2 Interested in Evolutionary Medicine, History of Medicine and Science, and Public Health

Levi Githaiga @CodeTitanium

11 Followers 384 Following

AIBabysteps @AiBabysteps

6 Followers 153 Following

Intrapreneur | AI Operations Expert | Digital Marketing | Digital Products | Digital Transformation // Impressum/Datenschutz: https://t.co/l9LWBOv3V5

fabian fronhofer @fabianfronhofer

629 Followers 5K Following Intrapreneur | AI Operations Expert | Digital Marketing | Digital Products | Digital Transformation // Impressum/Datenschutz: https://t.co/l9LWBOv3V5

Meow @sanczsun

282 Followers 1K Following log charts only 🔭

ยิรา @1D80gDNDqNhhC1

52 Followers 1K Following เป็นเกียรติอย่างยิ่งที่ได้พบคุณที่นี่ หากชอบ ติดตามได้ ผมจะอัพเดตข้อมูลติดต่อในหน้าแรกได้ตลอดเวลาครับ

emi learns @ml_emiii

7 Followers 101 Following learning llm engineering and advanced/concurrent typescript/js from ground up before @elicitorg internship

Yorick van Zweeden @YorickZweeden

5 Followers 46 Following

Jatin Nainani Z 🍃 @zephyr_wade

57 Followers 391 Following Trying to reverse engineer intelligence @ Umass CS

Samir @samir_moussa

345 Followers 2K Following ML engineer @usecaribou (YC W19). Prev data scientist @signalhq

Priyav K Kaneria @_diginova

69 Followers 381 Following Adapting is underrated. time spent with me is an investment. assume that i have an anime pfp

Sam Coward @samcoward

387 Followers 1K Following Father, husband, subpar musician. Platform Engineering @VMWare Tanzu. XP, Agile. Leveling up on AI and ML; accelerating.

Over-engineering Agentic Systems for long-form writing. Generating scripts, fiction or non, breakthrough ideas, whole universes, etc. The Deep Writer. DM4demo.

Garrett -DeepWriterAI @DeepAIWriter

12K Followers 6K Following Over-engineering Agentic Systems for long-form writing. Generating scripts, fiction or non, breakthrough ideas, whole universes, etc. The Deep Writer. DM4demo.

Victoriayiyiyi @jnwangyi

128 Followers 2K Following

hooman @rhymewithwine

25 Followers 357 Following tryna sail across life.

Pytorch To Atoms @PytorchToAtoms

10 Followers 18 Following Deep Dive Across the Whole Stack from Pytorch to Cuda To Atoms

virat @virattt

6K Followers 77 Following Exploring multimodal AI models and sharing what I learn along the way • previously @AirbnbEng

Alan Lockett @hypernicon

47 Followers 116 Following

5x engineer, natural agi. mostly shitposting, occasionally insightful.
chai, anime and ai enjoyer.
leveling up in llm landscape

sankalp @dejavucoder

6K Followers 510 Following 5x engineer, natural agi. mostly shitposting, occasionally insightful. chai, anime and ai enjoyer. leveling up in llm landscape

Senthooran Rajamanoha.. @sen_r

100 Followers 43 Following

Lethic @lethic1

3K Followers 1K Following 将来的寰球，必是赤旗的世界

I want to talk to you about the affect and aesthetics of computing. All my PLposting now lives at @sliminality@types.pl

Nintendo .DS_Store @sliminality

10K Followers 177 Following I want to talk to you about the affect and aesthetics of computing. All my PLposting now lives at @[email protected]

Kenneth Li @ke_li_2021

714 Followers 418 Following

Yossi Kreinin @YossiKreinin

1K Followers 25 Following Animation: https://t.co/szZQmzdUUR Programming: https://t.co/WWRLYifnxh

Kevin Slagle @kjslag

95 Followers 193 Following professor @RiceUniversity interested in quantum physics and deep learning

Brian Huang @brianryhuang

1K Followers 1K Following

Guilherme Penedo @gui_penedo

2K Followers 2K Following ML Research Engineer at 🤗. Lisboeta 🇵🇹

Michael Tang @_michaeltang_

175 Followers 677 Following nlp, interpretability @princeton_nlp

Vatsa Pandey @_VatsaDev_

55 Followers 156 Following CS+Aero Undergrad I conjure data @nousresearch

xjdr @_xjdr

86 Followers 115 Following Linear Algebra at Scale, Jax apologist, Raconteur

Multiorb @Multiorb

8 Followers 1 Following Building something new. https://t.co/wc7p8S9eCa

pleias @pleiasfr

236 Followers 1 Following

Rasmus Larsen @synquid

43 Followers 67 Following AI & LLMs @ Alexandra Institute. https://t.co/abXrErO7j3

Effort - https://t.co/ScM4oPI5JG - an llm inference algorithm , previously https://t.co/MHsV9jJryz - decompiler for Ethereum smart contracts.

Tomasz Kolinko @kolinko

3K Followers 545 Following Effort - https://t.co/ScM4oPI5JG - an llm inference algorithm , previously https://t.co/MHsV9jJryz - decompiler for Ethereum smart contracts.

okio @okio_ai

924 Followers 0 Following Amplifying the experience of sound. Developers of Nendo.

efxmarty @efxmarty

343 Followers 134 Following ML Engineer at @huggingface Optimization team. efxmarty/fxmarty elsewhere

ikaros / イカロス @ok_ikaros

350 Followers 170 Following product @spellbrush @nijijourney | edelgard simp

Yoneh @yoneh_

286 Followers 334 Following Aspiring schizo • AI

Sergey Levine @svlevine

80K Followers 122 Following Associate Professor at UC Berkeley Co-founder, Physical Intelligence

Co-Founder @udiomusic. Research Scientist. Previously: @DeepMindAI, Mila (Montréal, Canada), Skoltech (Moscow, Russia). Views are my own.

Yaroslav Ganin @yaroslav_ganin

4K Followers 231 Following Co-Founder @udiomusic. Research Scientist. Previously: @DeepMindAI, Mila (Montréal, Canada), Skoltech (Moscow, Russia). Views are my own.

Charlie Nash @charlietcnash

2K Followers 735 Following Co-founder @udiomusic

David Ding @DavidDingAI

2K Followers 122 Following CEO and co-founder of @udiomusic. ex Google DeepMind

swyxio (mamba mode) @swyxio

1K Followers 1K Following @swyx’s waluigi. people praise you in public for the work you do in private.

Abhik Roychoudhury @AbhikRoychoudh1

1K Followers 57 Following Professor of Computer Science at National University of Singapore

udio @udiomusic

28K Followers 0 Following

Alexander Koch @alexkoch_ai

5K Followers 203 Following Founder of Tau Robotics (@taurobots) | Z Fellow | Emergent Ventures Fellow 2024

Zeyuan Allen-Zhu @ZeyuanAllenZhu

8K Followers 273 Following physics of language models @ Meta / FAIR IOI - USACO - MCM - ACM/ICPC - Codejam Tsinghua - MIT - Princeton/IAS - MSR - FAIR

Aleksander Holynski @holynski_

1K Followers 201 Following Google Research / UC Berkeley

Jerry Tworek @MillionInt

7K Followers 282 Following I teach programs how to program @ OpenAI | putting the ball in the damn hoop - @jacobmenick

Business Cooperation↓
qinglongshengzhe@gmail.com

青龍聖者 @bdsqlsz

2K Followers 396 Following Business Cooperation↓ [email protected]

dreadnode @dreadnode

783 Followers 22 Following AI Red Teaming | Research. Tooling. Evals. Cyber ranges.

Sybil @runsybil

203 Followers 20 Following Automating hacker intuition. DM for private beta access

Security reviews and research that keep winners winning. We apply unmatched hacking talent to secure critical software for the most innovative teams.

Zellic @zellic_io

12K Followers 14 Following Security reviews and research that keep winners winning. We apply unmatched hacking talent to secure critical software for the most innovative teams.

Eric Steinberger @EricSteinb

7K Followers 478 Following Writing code that writes code on a mission to build safe superintelligence | CEO/cofounder @magicailabs

Ege Erdil @EgeErdil2

2K Followers 251 Following update your effect size estimates downwards

AI Safety Institute @AISafetyInst

541 Followers 29 Following We’re building a team of world leading talent to tackle some of the biggest challenges in AI safety - come and join us.

vLLM @vllm_project

778 Followers 11 Following A high-throughput and memory-efficient inference and serving engine for LLMs

Desh Raj @rdesh26

3K Followers 2K Following Research Scientist @Meta (AI Speech) | Previously: @jhuclsp, @IITGuwahati

Performance optimization lead @AnthropicAI. Profiling, distributed systems, dev tools, interpretability. tristan@thume.ca

Tristan Hume @trishume

6K Followers 330 Following Performance optimization lead @AnthropicAI. Profiling, distributed systems, dev tools, interpretability. [email protected]

Aaquib Syed @aaquib_syed1

70 Followers 55 Following CS+Math @ UMD

Joseph Bloom @JBloomAus

186 Followers 148 Following Independent Alignment Research Engineer. Likes vegan food. loves puns.

Callum McDougall @calsmcdougall

276 Followers 4 Following AI safety researcher & fieldbuilder / film enjoyer / coffee drinker / Hitchhikers' Guide enthusiast.

bilal2vec @bilaltwovec

2K Followers 780 Following ✨ research engineer • prev @googlebrain @cohere @dbrxmosaicai • se @uwaterloo

Khyle. @khyleri

1.3M Followers 217 Following Normal artist from 10 years ago

Trevor Cai @trevorycai

6K Followers 125 Following Minimizing pJ/nat @OpenAI. Previously @DeepMind.

finbarr @finbarrtimbers

3 hours ago

ChatGPT told me this was the case a few months ago and I didn’t believe it 😂

Horace He @cHHillee

20 hours ago

Many don't know that GPUs automatically leverage ternary and fine-grained sparsity to accelerate your matmuls! e.g. A matmul with ternary + 90% sparsity results in 33% more FLOPs in my benchmark. (not joking) I explore this "optimization" here: thonking.ai/p/strangely-ma… (1/3)

15 34 262 34K 165

Download Image

0 1 16 1K 1

sankalp @dejavucoder

5 hours ago

mfw if that gpt2 thing comes out to be gpt4.5. i hope it is some different architecture or some old model trained with new techniques and more data.

1 0 21 1K 1

Download Image

yi 🦛 @agihippo

7 hours ago

@teortaxesTex Bloom is the worse model normalised by swe/researcher hours perhaps. Can't go lower than that. It's the absolute lower bound.

1 0 7 839 2

Teortaxes▶️ @teortaxesTex

7 hours ago

@agihippo Scared to ask what you think about BLOOM lol

1 0 4 843 0

yi 🦛 @agihippo

7 hours ago

15 people? That's cute. 😬 At reka our pretraining team is 3-5 people at max, who were all also working >50% time in other projects. 🫠

Heinrich Kuttler @HeinrichKuttler

17 hours ago

Our latest model Inflection-2.5 (inflection.ai/inflection-2-5) is not bad. In fact, it was the ~4th best publicly "known" models when it was released in early March. And it was created by our pretraining team of < 15 people! 2/

1 1 16 21K 4

8 0 68 19K 18

Teortaxes▶️ @teortaxesTex

7 hours ago

Boring paper/experiment idea: Scaling Laws for Distillation into Ternary Transformers (h/t @kalomaze for inspiration and the correct warning it's unlikely to work at proposed scale. Well, one could go bigger now) The objective: finding a compute-optimal regime to convert…

2 1 12 972 3

Teortaxes▶️ @teortaxesTex

7 hours ago

Tired but true lesson: small teams are more agile because they *don't need to* carry the weight of corporate politics. This can more than compensate for having fewer heads to bash at real problems. Case in point: BLOOM. I think Yi doesn't even have a separate "scaling team".

yi 🦛 @agihippo

7 hours ago

15 people? That's cute. 😬 At reka our pretraining team is 3-5 people at max, who were all also working >50% time in other projects. 🫠

8 0 68 19K 18

1 1 10 1K 0

Download Image

gion @gi0nyx

7 hours ago

realizing uni is almost over, and I still haven't figured out anything

9 4 121 3K 11

Download Image

Leo Gao @nabla_theta

9 hours ago

@AdamSJermyn > fixing the values of other hyper parameters (learning rate, batch size, optimization protocol, etc.) why did you hold lr fixed when scaling? intuitively we'd expect larger autoencoders to require smaller lr, right? so this seems like a potential confounder

0 0 5 164 1

near @nearcyan

10 hours ago

said he was “surprised i was a real person and in SF” although at least one of these is disclosed on my profile i wished him good luck. he was wearing a humane ai pin so i figured he’d need it

5 1 94 4K 2

Leo Gao @nabla_theta

10 hours ago

@nearcyan not that many people walk around wearing a Jensen electoral map cape

1 0 32 1K 0

near @nearcyan

11 hours ago

someone just randomly walked up to me to tell me he’s shorting nvidia. no clue how he knew what i looked like

5 1 192 20K 9

Colin and Samir ✌🏼✌🏾 @ColinandSamir

16 hours ago

YouTube is a ticket to the extraordinary One (really good) video turned Harvard student Wesley Wang into the youngest director in history to set up a movie at a major studio

103 787 13K 2.4M 5K

Download Image

Nathan Odle @mov_axbx

11 hours ago

I’d go so far as to say a 20ga would be enough, reducing payload requirements some. Hope they aren’t trying box mag though, very hard to tune the feed. Better off with tube fed or even a composite / 7075 Al revolver cylinder, chamber pressures are relatively low

1 0 3 165 0

Nathan Odle @mov_axbx

11 hours ago

Seriously, what took them so long to do this? #4 birdshot from an x-full choke will rip the crap out of anything unarmored, and at altitude poses no real threat to people and stuff on the ground

Kenneth Cassel @KennethCassel

13 hours ago

drone on drone violence is out of hand

5 9 57 16K 15

Download Video

5 1 29 3K 4

Jacques @JacquesThibs

a day ago

The productivity hack of just being around highly intelligent / competent people is just so good, acts as a powerful forcing function to elevate your own game. And this obviously includes a romantic partner.

1 0 21 1K 4

tphuang @tphuang

3 days ago

Sensetime's SenseNova 5.0 left great impression of investors. Stock doubled since release on 23rd Various AI benchmarks put SenseNova 5.0 broadly on same ballpark as GPT-4 Turbo, even better in some areas. Same w/ OpenCompass 2.0 It was trained on 10TB of token w/ 600B params…

4 32 110 21K 35

Download Image

Jason Wei @_jasonwei

22 hours ago

Enjoyed this paper that plots emergent abilities with pretraining loss on the x-axis, which is actually a suggestion that @OriolVinyalsML also made a few years back: arxiv.org/abs/2403.15796 The paper uses intermediate checkpoints to plot a variety of pretraining losses. For some…

7 45 317 42K 272

Download Image

Teortaxes▶️ @teortaxesTex

11 hours ago

SenseNova 5 is allegedly the best model they have now Roughly L3-405B compute-wise (if dense) x.com/tphuang/status…

tphuang @tphuang

3 days ago

4 32 110 21K 35

Download Image

0 0 4 522 0

Teortaxes▶️ @teortaxesTex

11 hours ago

> In the first experiment, we train three models with 1.5B, 6B, and 32B parameters and observe their behaviors until trained on 3T, 3T, and 2.5T tokens Meaning GLMs. Zhipu&THUDM have also developed GLM-4 which had been the strongest Chinese "GPT-4 killer" until ≈last week.