Yi Tay @YiTayML
Chief scientist & Co-founder @RekaAILabs past: Research Scientist @Google Brain 🧠 currently learning to be a dad 🍼👶 yitay.net mixture-of-locations Joined October 2016-
Tweets3K
-
Followers28K
-
Following97
-
Likes7K
instead of evaluating models, we can start to evaluate researchers instead! 😀 i've always had this floating idea of giving people transformer configs and asking them to predict configurations that works better. could be data mix, architectures, hparams whatever. would be a fun…
instead of evaluating models, we can start to evaluate researchers instead! 😀 i've always had this floating idea of giving people transformer configs and asking them to predict configurations that works better. could be data mix, architectures, hparams whatever. would be a fun…
🔥Newly updated scores for Reka Core, Flash and Edge on MMMU leaderboard: mmmu-benchmark.github.io.
Yes, check out @RekaAILabs's strong Flash-21B model!
Yes, check out @RekaAILabs's strong Flash-21B model!
Reka Core, Flash, and Edge A Series of Powerful Multimodal Language Models We introduce Reka Core, Flash, and Edge, a series of powerful multimodal language models trained from scratch by Reka. Reka models are able to process and reason with text, images, video, and audio
I heard the key to @RekaAILabs's success is a new algorithm called AgiHi-PPO
Our @RekaAILabs Tech Report / Paper is out! 🔥 Tech reports with completely no information are kinda boring so we’re revealing some interesting information on how we train our series of Reka models including tokens, architecture, data & human evaluation workflows. 😃 We tried…
One year since I posted this so here's an update! Adding @donovanOng_ to the list of notable Singaporean researchers/engineers doing great work in AI and LLMs. He helped train Reka's (@RekaAILabs) series of OP models (Core, Flash, Edge) so he deserves to be on this list! 🔥
One year since I posted this so here's an update! Adding @donovanOng_ to the list of notable Singaporean researchers/engineers doing great work in AI and LLMs. He helped train Reka's (@RekaAILabs) series of OP models (Core, Flash, Edge) so he deserves to be on this list! 🔥
It's been a wild ride. Just 20 of us, burning through thousands of H100s over the past months, we're glad to finally share this with the world! 💪 One of the goals we’ve had when starting Reka was to build cool innovative models at the frontier. Reaching GPT-4/Opus level was a…
It's been a wild ride. Just 20 of us, burning through thousands of H100s over the past months, we're glad to finally share this with the world! 💪 One of the goals we’ve had when starting Reka was to build cool innovative models at the frontier. Reaching GPT-4/Opus level was a…
It's inspiring to see what a small team can accomplish in such a short period of time. @RekaAILabs, an enterprise multimodal LLM company, has only had access to 90% of their compute for the past 4 months, but that hasn't stopped the brilliant team of 20 to go head-to-head in…
Didn't get much chance to share this yesterday with everything else going on with the Reka core launch but here's the most non-cherry picked showcase of Reka Core vs GPT-4 vs Claude Opus on multimodal chat tasks. 👇 We put together this showcase with examples our team created.…
Meet Reka Core, our best and most capable multimodal language model yet. 🔮 It’s been a busy few months training this model and we are glad to finally ship it! 💪 Core has a lot of capabilities, and one of them is understanding video --- let’s see what Core thinks of the 3 body…
Feels legit. I might prefer Reka Core's multimodal performance to 1.5 too.
現時点でトップクラスの言語モデルを作成できた組織 ① OpenAI(GPT-4) ② Google(Gemini Ultra、Gemini 1.5 Pro) ③ Anthropic(Claude 3 Opus) ④ Inflection AI(Inflection 2.5) ⑤ Reka(Reka Core) ⑥ xAI(Grok-1.5) ⑦ Mistral(Mistral large) Metaは次のLLaMA 3で加わる可能性あり
research is an immensely taxing endeavour. hours spend doing IC work, debugging and what not. a paper is a canvas for researchers to express themselves after all the hard work, at the end of the day. it's my art. at least let me paint the way i want to paint. The reason why i am…
research is an immensely taxing endeavour. hours spend doing IC work, debugging and what not. a paper is a canvas for researchers to express themselves after all the hard work, at the end of the day. it's my art. at least let me paint the way i want to paint. The reason why i am…
Lucas Beyer (bl16) @giffmana
56K Followers 444 Following Researcher (Google DeepMind/Brain in Zürich, ex-RWTH Aachen), Gamer, Hacker, Belgian. Mostly gave up trying mastodon as [email protected]Delip Rao e/σ @deliprao
46K Followers 5K Following Busy inventing the shipwreck. @Penn. Past: @johnshopkins, @UCSC, @Amazon, @Twitter ||Art: #NLProc, Vision, Speech, #DeepLearning || Life: 道元, improv, running 🌈Eric Jang @ericjang11
69K Followers 3K Following physical AGI at 1X. Author of "AI is Good for You" https://t.co/eFg4WXhg0pRosanne Liu @savvyRL
33K Followers 966 Following Cofounded & running @ml_collective. Host of Deep Learning Classics & Trends. Research at Google DeepMind. DEI/DIA Chair of ICLR & NeurIPS. Writing https://t.co/IbycyGfnDRAkari Asai @AkariAsai
11K Followers 650 Following Ph.D. student @uwcse & @uwnlp. NLP. IBM Ph.D. fellow (2022-2023). Meta student researcher (2023-) . ☕️ 🐕 🏃♀️🧗♀️🍳Graham Neubig @gneubig
31K Followers 586 Following Associate professor at CMU, studying natural language processing and machine learning.Kyunghyun Cho @kchonyc
61K Followers 2K Following a combination of a mediocre scientist, a mediocre manager, a mediocre advisor & a mediocre PC at @nyuniversity (@CILVRatNYU) & @genentech (@PrescientDesign).rohan anil @_arohan_
12K Followers 2K Following Principal Engineer, @GoogleDeepMind Gemini. prev PaLM-2. Tinkering with optimization and distributed systems. opinions are my own.Shane Gu @shaneguML
28K Followers 1K Following Research Scientist & Manager @GoogleDeepMind Tokyo/MTV. ex: @GoogleAI Brain, @OpenAI. (JP: @shanegJP)Sam Bowman @sleepinyourhat
35K Followers 3K Following AI alignment + LLMs at NYU & Anthropic. Views not employers'. No relation to @s8mb. I think you should join @givingwhatwecan.Yao Fu @Francis_YAO_
13K Followers 2K Following PhD @EdinburghNLP on LLMs and Machine Reasoning. Ex. @Columbia @PKU1898 @MITIBMLab @allen_ai AGI has yet to come, so keep runningAakanksha Chowdhery @achowdhery
7K Followers 3K Following LLMs @ Google DeepMind :: PaLM, Gemini // Previously @MSFTResearch, @Stanford, @Princeton // views my own and subject to changeDenny Zhou @denny_zhou
9K Followers 420 Following @GoogleDeepMind founder & lead of Reasoning Team. Build LLMs to reason. Opinions my own.Ethan Caballero is bu.. @ethanCaballero
8K Followers 2K Following ML PhD student @Mila_Quebec ; previously @GoogleDeepMindColin Raffel @colinraffel
30K Followers 654 Following nonbayesian parameterics, sweet lessons, and random birds. Friend of @srush_nlpBehnam Neyshabur @bneyshabur
18K Followers 689 Following Senior Staff Research Scientist @GoogleDeepMind, Interested in reasoning w. LLMs, traveling & backpackingWenhu Chen @WenhuChen
11K Followers 520 Following AI researcher @UWaterloo @GoogleAI @VectorInst. Interested in natural language processing, diffusion models. I direct TIGER-Lab at UWaterloo.libingchen @libingchen13619
12 Followers 37 Followingelbert @elbert866777443
11 Followers 40 Following🎥 Aiography @aiography_ai
12 Followers 67 Following Diving deep into AI creativity, focusing on video & image generation. Exploring cutting-edge tech and tools. Discovering new dimensions of visual storytelling.Humam @Humam35676679
12 Followers 411 FollowingSanjana Prasad @sanjanpra2k01
259 Followers 646 Following Grad @UTAustin | ML | Systems | Researcher | Lifelong Learner | Computational Scientist👩💻| Growth Mindset | Chennai🏡Haoyuan Huang @HaoyuanHuang22
2 Followers 47 FollowingKyle @ksaieng
33 Followers 79 FollowingNir Peled @_nir_peled
73 Followers 313 FollowingSecosoez @secosoez25339
0 Followers 54 FollowingSivaKesava @___skesava
0 Followers 1K Followingmixedsignal @mixedsignal
5 Followers 35 FollowingAli Naqvi @1NaqviAli
2 Followers 12 Following First-year MSc student at McMaster University studying evolutionary computation and ml.shubhang @s_bhatnagar_tw
34 Followers 193 Following Computer Vision PhD Student @ECEILLINOIS, Undergrad @iitbombayJohn Thilén @JohnThilen
2 Followers 344 Followingycao @ycao01
101 Followers 683 Following "It was the best of times, it was the worst of times, it was the age of wisdom, it was the age of foolishness." Charles Dickens, A Tale of Two CitiesTrunkboy PeeZ @P10895Peez
4 Followers 204 FollowingEric Huang @EricHuang4312
1 Followers 52 FollowingJaehee Kim @Jaehee_kim_NLP
0 Followers 34 FollowingDavid Tan @tanxw1995
0 Followers 145 FollowingSteven Caliari @0B4Bq
1 Followers 50 FollowingSwarup Dwivedy @swarup5662
9 Followers 43 FollowingYellow Dot Cafe @yellowdotcafe
54 Followers 42 FollowingEthan Chan @EthanCh05696449
10 Followers 104 Following歪门正道 @bushiwu5
2 Followers 12 FollowingElectronicsseeker @libertarian108
6 Followers 889 Followingko @code_and_ram
21 Followers 138 Following If you hear me screaming bloody murder, there’s a good chance I’m just enjoying myself.Miguel_Pedraza @CabezaDespejada
56 Followers 2K FollowingGabriel Fiastre @gabfstr
3 Followers 40 FollowingMichael Zolotov @mzolotov_alt
8 Followers 83 FollowingJun Zeng @junzengx14
300 Followers 158 Following SDE @Cruise, Ph.D. in Robotics @UCBerkeley, X2014 @Polytechnique. Love mathematics, robotics and programming.Yihuai Hong @YihuaiH91773
26 Followers 136 Following CS Undergraduate interested in NLP research @SCUT Rearch Intern in @UCLRoger Wang @rogerw0108
17 Followers 46 Following Flowers and friendship | ML Platform & Infra @RobloxAkinropo Taiwo @taiwo_akinropo
488 Followers 1K Following Building @HeyfoodAfrica(https://t.co/ihx9UEkXhp)Aravind Ramarathinam @aravr
71 Followers 200 Following Is experience like a comb that you get once you are bald? 🤔Ashant Chalasani @ashant
88 Followers 140 FollowingLannister @Lannister998
5 Followers 99 FollowingLucas Beyer (bl16) @giffmana
56K Followers 444 Following Researcher (Google DeepMind/Brain in Zürich, ex-RWTH Aachen), Gamer, Hacker, Belgian. Mostly gave up trying mastodon as [email protected]Eric Jang @ericjang11
69K Followers 3K Following physical AGI at 1X. Author of "AI is Good for You" https://t.co/eFg4WXhg0pRosanne Liu @savvyRL
33K Followers 966 Following Cofounded & running @ml_collective. Host of Deep Learning Classics & Trends. Research at Google DeepMind. DEI/DIA Chair of ICLR & NeurIPS. Writing https://t.co/IbycyGfnDRGraham Neubig @gneubig
31K Followers 586 Following Associate professor at CMU, studying natural language processing and machine learning.Kyunghyun Cho @kchonyc
61K Followers 2K Following a combination of a mediocre scientist, a mediocre manager, a mediocre advisor & a mediocre PC at @nyuniversity (@CILVRatNYU) & @genentech (@PrescientDesign).rohan anil @_arohan_
12K Followers 2K Following Principal Engineer, @GoogleDeepMind Gemini. prev PaLM-2. Tinkering with optimization and distributed systems. opinions are my own.Shane Gu @shaneguML
28K Followers 1K Following Research Scientist & Manager @GoogleDeepMind Tokyo/MTV. ex: @GoogleAI Brain, @OpenAI. (JP: @shanegJP)Sam Bowman @sleepinyourhat
35K Followers 3K Following AI alignment + LLMs at NYU & Anthropic. Views not employers'. No relation to @s8mb. I think you should join @givingwhatwecan.Yao Fu @Francis_YAO_
13K Followers 2K Following PhD @EdinburghNLP on LLMs and Machine Reasoning. Ex. @Columbia @PKU1898 @MITIBMLab @allen_ai AGI has yet to come, so keep runningAakanksha Chowdhery @achowdhery
7K Followers 3K Following LLMs @ Google DeepMind :: PaLM, Gemini // Previously @MSFTResearch, @Stanford, @Princeton // views my own and subject to changeDenny Zhou @denny_zhou
9K Followers 420 Following @GoogleDeepMind founder & lead of Reasoning Team. Build LLMs to reason. Opinions my own.Colin Raffel @colinraffel
30K Followers 654 Following nonbayesian parameterics, sweet lessons, and random birds. Friend of @srush_nlpSara Hooker @sarahookr
39K Followers 7K Following I lead @CohereForAI. Formerly Research @Google Brain @GoogleDeepmind. ML Efficiency at scale, LLMs, @trustworthy_ml. Changing spaces where breakthroughs happen.Thomas Wolf @Thom_Wolf
68K Followers 4K Following Co-founder and CSO @HuggingFace - open-source and open-scienceSebastian Gehrmann @sebgehr
5K Followers 2K Following Head of NLP, CTO office, @Bloomberg. (he/him) Generating natural language, one word at a time. Also making sense of that language afterwards. views my ownMahesh Sathiamoorthy @madiator
9K Followers 930 Following LLMs and Data. Discuss about data for LLMs: https://t.co/x4iAft5cHV Ex-GoogleDeepMindAndrej Karpathy @karpathy
978K Followers 904 Following 🧑🍳. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets 🧠🤖💥Afroz Mohiuddin @afrozenator
1K Followers 5K Following Research Engineer at Google Brain. Interested in Science, Psychology, Investing, Design and generally almost everything. Good Thoughts, Good Words, Good Deeds.Swaroop Mishra @Swarooprm7
5K Followers 893 Following Research Scientist @GoogleDeepMind (Gemini). Pioneering LLM Research 🔥. Instruction tuning, Factuality, Reasoning and next gen Product. Opinions my own.Piotr Padlewski @PiotrPadlewski
1K Followers 319 Following Chief Meme Officer @ https://t.co/CtBrcKmliI, ex-Google Deepmind/Brain ZurichMax Bain @maxhbain
2K Followers 498 Following multimodal @RekaAILabs | prev: phd @Oxford_VGG hardwork-pilledSteven Zheng @HuaixiuZheng
171 Followers 60 Following Trained in quantum computing and quantum physics, LLM research in Google DeepMindChe Zheng @xvblack
110 Followers 155 Following Member of Technical Staff at @RekaAILabs . Past: @Google, @Official_KwaiQi Liu @leuchine
381 Followers 402 Following Cofounder @RekaAILabs, Assistant Professor @HKUniversity Past: @DeepMind, FAIR (@MetaAI), @MSFTResearch, PhD @UniofOxfordKarina Nguyen @karinanguyen_
12K Followers 648 Following AI research & eng @AnthropicAI, prev. intern @nytimes, @square, @dropboxReka @RekaAILabs
11K Followers 13 Following An AI research and product company 🫠. We are a team of scientists and engineers building state-of-the-art multimodal language models 😻Matt Henderson @matthen2
79K Followers 2K Following maths, visualisations, conversational AI. currently @RekaAILabs - previously: @Apple AI/ML, @PolyAI, @GoogleAI, MSc @EdinburghUni, PhD @Cambridge_Engyi 🦛 @agihippo
3K Followers 81 Following secondary account, hardcore fans only. friend of @agikoala the great researcher, main account: @yitayml warning: hot takes.jason @agikoala
2K Followers 24 Following secondary account (main is @_jasonwei) @agihippo is a buddy of mineGuillaume Lample @GuillaumeLample
37K Followers 648 Following Cofounder & Chief Scientist https://t.co/hLfvKLkFHd (@MistralAI). Working on LLMs. Ex @MetaAI | PhD @Sorbonne_Univ_ | MSc @CarnegieMellon | X11 @PolytechniqueArmand Joulin @armandjoulin
4K Followers 344 Following principal researcher, @googledeepmind. ex director of emea at fair @metaai. mostly work on open projects: fasttext, dino, llama, gemma.Tengyu Ma @tengyuma
25K Followers 512 Following Assistant professor at Stanford; Co-founder of Voyage AI (https://t.co/wpIITHLgF0) ; Working on ML, DL, RL, LLMs, and their theory.Jerry Wei @JerryWeiAI
5K Followers 261 Following 🧐 Improving and aligning large language models 🧠 Research Engineer @GoogleDeepMind ⏰ Past: @Stanford, @Google BrainAdam Roberts @ada_rob
7K Followers 646 Following ai researcher @ Google DeepMind :: ♫ (MusicVAE, NSynth, MusicLM, SingSong) & 📝 (T5, PaLM) & :: t5x & seqio // recovering comp biologistMikel Artetxe @artetxem
6K Followers 221 Following Co-founder @RekaAILabs and Honorary Researcher @IxaGroup (University of the Basque Country) | Past: Research Scientist @AIatMeta (FAIR)Derek Zhiyuan Cheng @infolaber
491 Followers 847 Following Principle Engineer / Research Director at Google DeepMind. Formerly Google Brain, Pinterest, and Texas A&M.Dan Zhang @DZhang50
2K Followers 780 Following Researcher @ Google DeepMind | ML for Systems | Systems for ML | Computer Architecture PhD @ UT Austin🤘 | Opinions stated here are my own.rishi @RishiBommasani
4K Followers 2K Following Stanford CS PhD @StanfordCRFM @StanfordNLP @StanfordAILab @StanfordHAI Advisers: @percyliang @jurafsky Previous: @CornellCIS @clairecardie #FoundationModelsShayne Longpre @ShayneRedford
4K Followers 998 Following PhD @MIT. Prev: @Google Brain, @apple ML, @stanfordnlp. 🇨🇦 Interests: AI/ML/NLP, Data-centric AI, transparency & societal impactXinyun Chen @xinyun_chen_
4K Followers 840 Following Research Scientist at @GoogleDeepMind. PhD from @Berkeley_EECS.Stephanie Chan @scychan_brains
3K Followers 2K Following Senior Research Scientist at DeepMind. Artificial and biological brains 🤖 🧠 Views are my ownLuke Zettlemoyer @LukeZettlemoyer
8K Followers 2K FollowingJiahui Yu @jhyuxm
2K Followers 777 Following Member of Technical Staff @OpenAI; previously Research Scientist at Google Brain/DeepMind.Jeremiah Harmsen @JeremiahHarmsen
1K Followers 488 Following Creator of #TensorFlowHub and @TensorFlow Serving. Lead in Google Brain.Siamak Shakeri @MaxSonate
314 Followers 263 Following Engineer at Google, Working on Language Models. Snowboarding and traveling when not workingChristian Szegedy @ChrSzegedy
32K Followers 2K Following #deeplearning, #ai research scientist. Opinions are mine.Pang Wei Koh @PangWeiKoh
3K Followers 789 Following Assistant professor at @uwcse. Formerly @StanfordAILab @GoogleAI @Coursera. 🇸🇬Xiaohua Zhai @XiaohuaZhai
3K Followers 206 Following Senior Staff Researcher @GoogleDeepMind team in ZürichBarret Zoph @barret_zoph
10K Followers 880 Following @openai Past: Research Scientist at Google Brain.Divy Thakkar @divy93t
5K Followers 2K Following Strategy, Programs & Product @GoogleAI , HCI Researcher. Ph.D @CityUniLondon Alumni @iift1963 @daiictofficial. Personal views.Kristina Toutanova @toutanova
878 Followers 207 FollowingAshish Vaswani @ashVaswani
19K Followers 2K Followingif your transformers struggle with NaNs after a certain parameter size, you may be under a sophon lock. Keep pushing, don't let them win!
@YiTayML @_jasonwei I just checked that doc. The most closest guess was from @hwchung27 actually. Everyone else was just so wrong ...
@YiTayML IIRC Bridgewater used to do stuff vaguely similar to this
Cannot agree more. My intuition is that FFN is for storing knowledge (this is why most knowledge editing are on FFNs) and Attention is for implementing algorithms (this is why most mechanistic interpretability, e.g., induction heads, are on Attn). Additionally, it seems that…
not true, especially for language. if you trained a large & deep MLP language model with no self-attention, no matter how much data you'll feed it you'll still be lacking behind a transformer (with much less data). will it get to the same point? i don't think so. your tokens…
In AI research there is tremendous value in intuitions on what makes things work. In fact, this skill is what makes “yolo runs” successful, and can accelerate your team tremendously. However, there’s no track record on how good someone’s intuition is. A fun way to do this is…
@YiTayML Congrats! Looking forward to the apps building on top of it!
> be me > on vacation > kid asleep, wife away > but I'm not tired! > whip out colab > load my model > import new benchmark > try my model > tfw sota, sota by far > double-check for bugs or leaks > no bug found > no leak found idk man, probably a bug. Also, twitter is reddit now.
@YiTayML Exactly. I don't see OpenAI or any other company training a 2 layer fully connected neural network with SGD to do Vision and throwing it a trillion data points "because data is all you need".
@YiTayML I was about to say you should not demean others' papers like that 😜
@YiTayML The hot take version of this is: Google does the real architecture research, while other companies take it for granted. All these companies are basically "data companies".
I am too uncomfortable w/ this "data is everything" maximalism. Not all archs have favourable scaling laws, easy to train at large scale, etc etc
not true, especially for language. if you trained a large & deep MLP language model with no self-attention, no matter how much data you'll feed it you'll still be lacking behind a transformer (with much less data). will it get to the same point? i don't think so. your tokens…
More wisdom from @YiTayML
not true, especially for language. if you trained a large & deep MLP language model with no self-attention, no matter how much data you'll feed it you'll still be lacking behind a transformer (with much less data). will it get to the same point? i don't think so. your tokens…
@YiTayML Absolutely, I tried a version of mlpmixer on language hoping to find something different from self-attention, the performance was horrible and it lacks basic abilities to generalize even on the simplest associative recall tasks…
I always strongly suggest people to read this work (arxiv.org/abs/2207.10551) by @YiTayML and @m__dehghani when discussing the model architecture. It almost takes up to 50% pages of the literature survey Chapter in my PhD thesis. It is so visionary to study this in 2022. I can…
not true, especially for language. if you trained a large & deep MLP language model with no self-attention, no matter how much data you'll feed it you'll still be lacking behind a transformer (with much less data). will it get to the same point? i don't think so. your tokens…
@YiTayML >"random guy" >opens paper, its the Yi Tay >lol
So well said, from a person that has a lot of training experience
not true, especially for language. if you trained a large & deep MLP language model with no self-attention, no matter how much data you'll feed it you'll still be lacking behind a transformer (with much less data). will it get to the same point? i don't think so. your tokens…
Flash is OP!
サイズを制限したなかで特に良い汎用モデル ①Google:Gemini 1.5 Pro ②Meta:LLaMA 3(8B、70B) ③Anthropic:Claude 3(Haiku、Sonnet) ④Reka:Reka Flash(21B)