Adam Ibrahim @ai_phd
PhD student in Machine Learning at Mila. Currently in the process of graduating. adamibrahim.fr Montréal, Québec Joined June 2019-
Tweets70
-
Followers516
-
Following429
-
Likes105
Look at our preprint on Continual Learning for increasing the scalability of LLMs pretraining. A great piece of work led by @ai_phd @benjamintherien and @kshitijkgupta 🔥
Look at our preprint on Continual Learning for increasing the scalability of LLMs pretraining. A great piece of work led by @ai_phd @benjamintherien and @kshitijkgupta 🔥
Here is the full paper of the continual pretraining project I have been working on last year. I encourage you to check it out if you pretrain LLMs (in particular, I recommend to start with takeaways in Section 2 and the Table of Contents at the start of the appendix).
Here is the full paper of the continual pretraining project I have been working on last year. I encourage you to check it out if you pretrain LLMs (in particular, I recommend to start with takeaways in Section 2 and the Table of Contents at the start of the appendix).
Simple and Scalable Strategies to Continually Pre-train Large Language Models Large language models (LLMs) are routinely pre-trained on billions of tokens, only to start the process over again once new data becomes available. A much more efficient solution is to continually
Mila presents Simple and Scalable Strategies to Continually Pre-train Large Language Models Shows efficient updates to LLMs using simple strategies, achieving re-training results with less compute arxiv.org/abs/2403.08763
State-space models (SSMs) like Mamba and mixture-of-experts (MoE) models like Mixtral both seek to reduce the computational cost to train/infer compared to transformers, while maintaining generation quality. Learn more in our paper: zyphra.com/blackmamba
Looking forward to see you at the #NeurIPS2023 #NeurIPS23 ENLSP workshop (rooms 206-207), where we'll have a poster about this work at 16:15 !
Looking forward to see you at the #NeurIPS2023 #NeurIPS23 ENLSP workshop (rooms 206-207), where we'll have a poster about this work at 16:15 !
@PranshuRanjan1 @SarvamAI Hi-NOLIN Hindi model will be presented by our @NolanoOrg team (@imtejas13 @_AyushKaushal) and collaborators from our CERC-AAI team (@kshitijkgupta @benjamintherien @ai_phd) at the #NeurIPS2023 this Fri, at this workshop: sites.google.com/mila.quebec/6t…
(1/8) The great success of diffusion models such as Stable Diffusion, DALLE & Emu, have raised questions about the use of synthetic data for classification. Our work, "Feedback-guided Data Synthesis for Imbalanced Classification," addresses this question: arxiv.org/abs/2310.00158
Rarely been so excited about a paper. Our model has a quality level higher than Stable Diffusion 2.1 at a fraction (less than 12%) of the training cost, less than 20% of the carbon footprint, and it is twice as fast at inference too! That's what I call a leap forward.
Rarely been so excited about a paper. Our model has a quality level higher than Stable Diffusion 2.1 at a fraction (less than 12%) of the training cost, less than 20% of the carbon footprint, and it is twice as fast at inference too! That's what I call a leap forward.
Irina Rish @irinarish
9K Followers 994 Following prof UdeM/Mila; Canada Excellence Research Chair; AAI Lab head https://t.co/UzlrC7ZrGF; INCITE project PI https://t.co/0rV7szd7rH; CSO https://t.co/XDhj6MEtUjAlex Hernandez-Garcia @alexhdezgcia
2K Followers 1K Following Postdoc at @Mila_Quebec · ML against climate change, ML and comp. neuroscience · Open Science · he/él/il #PalestinianLivesMatter 🍉Mahta 💻🧠 @Mahtaao
1K Followers 591 Following https://t.co/U0KQVw53s1 Ph.D. student in AI @Mila_Quebec, @ppsp_team, @UMontreal ex Physics student @UWaterloo - @DataforGoodWR - and @HexoskinShahab Bakhtiari @ShahabBakht
3K Followers 1K Following || assistant prof @UMontreal || leading the systems neuroscience and AI lab (SNAIL) || @Mila_Quebec || #NeuroAI || vision and learning in brains and machinesKoustuv Sinha @koustuvsinha
2K Followers 756 Following Research Scientist @MetaAI; PhD from @mcgillu + @Mila_Quebec; I organize ML Reproducibility Challenge (@repro_challenge). I work on Interpretable multimodal MLBenno Krojer @benno_krojer
2K Followers 2K Following PhDing in AI (Vision+Language) @Mila_Quebec and @mcgillu. Vanier Scholar. I try to see my research as an infinite game: I play so I get to continue playingArnav Jain @arnavkj95
340 Followers 1K Following PhD student University of Montréal and Mila Prev. Data & Applied Scientist, Microsoft | IIT KharagpurMohammad Pezeshki @mpezeshki91
1K Followers 237 FollowingBlake Richards @tyrell_turing
15K Followers 2K Following Researcher at @mcgillu combining AI and neuroscience. Also on Bluesky (@tyrellturing.bsky.social) and Mastodon: @[email protected].Arna Ghosh @arna_ghosh
918 Followers 875 Following PhD student @Mila_Quebec & @McGillU, Vanier scholar • 🧠+🤖 grad student• Ex-@RealityLabs @AIatMeta• Believer in Bio-inspired AI • Comedy+Cricket enthusiastJoseph Viviano @josephdviviano
3K Followers 4K Following humanistic technology bretheren @MILA_Quebec & mentor @creativedlab ~ AI for Science ~ ex @deepgenomics & @CAMHResearch, intern @google & @imagia_aiTimothée Lesort @TLesort
995 Followers 695 Following Senior Data Scientist @AIgnostics. DNNs for Pathology. Continual/Transfert Leaning, Generalisation. Prev. Postdoc. @Mila_Quebec with @irinarish, PhD @IP_Paris_Jason Hartford @jasonhartford
1K Followers 2K Following Staff Research Scientist @valence_ai. Prev: postdoc at @Mila_Quebec with Prof Yoshua Bengio, PhD at @UBC_CS. South African 🇿🇦Arnab @ArnabMondal96
581 Followers 413 Following Ph.D. candidate @mcgillu + @Mila_Quebec | @ServiceNowRSRCH | Undergrad @IITKgp | Formerly: @MSFTResearch @Apple @samsungresearchThe Sai Krishna @saikrishna_gvs
1K Followers 710 Following RL, LLM Research @AI_Redefined. Making agents understand humans and vice-versa. International Master in chess. https://t.co/dfAuClGTJiNithya Nadig Shikarpu.. @NithyaIsMe
442 Followers 665 Following interested in music + technology research |masters student at @Mila_QuebecSara Hooker @sarahookr
39K Followers 7K Following I lead @CohereForAI. Formerly Research @Google Brain @GoogleDeepmind. ML Efficiency at scale, LLMs, @trustworthy_ml. Changing spaces where breakthroughs happen.Khimya @khimya
4K Followers 999 Following Research Scientist @GoogleDeepmind Affiliate Faculty @Mila_Quebec Past: PhD @mcgillu @MSFTResearch @Intel @UF @IITKanpur Bosch @VIT_univ she/her Views are mine!Andreea Deac @andreeadeac22
2K Followers 535 Following PhD student @Mila_Quebec // Interned @DeepMind @MSFTResearch @Google // BA & MEng @cambridge_clSusieLeacock @AfsH74055Da7Y4r
0 Followers 72 FollowingThitheau @thitheau51660
0 Followers 63 FollowingJessieCocker @ws15XmpIHJO3sc
1 Followers 65 FollowingMark Finnern @finnern
8K Followers 4K Following Innovation Catalyst | Cloud Success Engineer iPaaS @ Software AG | Future Salon Founder | Empowering Success | TEDx SpeakerLela Orochena @LOrochena57133
87 Followers 5K FollowingPensé FFun @inftyCategory
100 Followers 6K FollowingMargaret Bline @MargaBli
44 Followers 5K FollowingErinn Whigum @eri_whig
31 Followers 5K FollowingArtem Zholus 🛩ICLR.. @artemZholus
230 Followers 424 Following Intern @GoogleDeepMind, PhD @Mila_Quebec. Ex: Intern @EPFL, @IgluContest, MSc@MIPT; @InSilicoMedsJettie Tupick @TupJetti
44 Followers 5K FollowingAshutosh Mehra @ashutoshmehra
1K Followers 5K Following Senior Principal Scientist at Adobe. Working on Acrobat AI Assistant, LLMs, and document ML.Eva Louise Marie Gabr.. @e681554349
9 Followers 3K Following@goth @goth600
50K Followers 7K Following VP, Witchcraft and Propaganda @ 𝕏 | Magic @ 21e8 | “tweets from the void” -redactedMark Collier 柯理�.. @sparkycollier
14K Followers 15K Following Austin Powered. OpenStack co-founder, OpenInfra Foundation COO, ex Rackspace & Yahoo! open source for fun & profit. Open Source AI early and often. Accelerate!rico @b_rich_now
49 Followers 2K FollowingGeorge @georgejrjrjr
2K Followers 846 Following The timeline vibetimes pipeline to things still more strange and enticing.Vasu Shyam @vasud3vshyam
316 Followers 287 Following Currently working as a machine learning researcher at a Silicon Valley startup. Former physics postdoc at Stanford and Branco Weiss fellow.Benji Smith @benji_smith
1K Followers 2K Followingcatid (e/acc) @MrCatid
3K Followers 633 Following Engineer at Juice Labs. Prior: Anduril, Oculus VR, Game Closure, MSEE@GATechHai Duong "Čan" Tran @PhoBoAI
17 Followers 125 Following Applying machine learning in practice at @seznam_czBryony Zlotnick @BryonZlotn
86 Followers 5K FollowingNoodle4 Ai @noodle4AI
40 Followers 133 Following Noodle4 is the first Ai platform to holistically review content against your requirements at speed, with trusted results. Sign up for BETA access!Marva Laub @marva74074
66 Followers 5K FollowingRishika Bhagwatkar @rishika2110
96 Followers 164 FollowingChloe Welter @welter_wel
27 Followers 5K FollowingArif Ahmad @arif_ahmad_py
274 Followers 7K Following All things AI, Computer Science and Circuits! Prev. @GoogleAIPeter Morales @PeterMoralesX
218 Followers 2K Following Founder of funded Stealth AI Startup. Interested in AI development at the edge? DM.larry covert @ldcovert
653 Followers 2K Following Be Scrappy. Do important work. Current life: https://t.co/8zuZBQdWCW, https://t.co/cWTa0m2qUi, Past life: Gigafund, Future life: ;) No investment advice, views mineAlessandro Sordoni @murefil
719 Followers 850 Following Researcher at Microsoft Research, Montréal. Views are my own.Pankaj Gupta @pankaj_ipynb
31 Followers 920 Following The English language can not fully capture the depth and complexity of my thoughts. So I'm incorporating Emoji into my speech to better express myself 😉.Shivam Rai @IMSHIVAMRAI282
178 Followers 3K FollowingJ. Lyle Kim @jlylekim
162 Followers 292 Following Ph.D. student @Optimalab1, Rice University🦉| Previously research intern @MetaAI | BA @UChicago | Optimization, machine learning, quantum computing | 🇰🇷NDUWIMANA Vianney @vianneyn07
27 Followers 569 FollowingCameron R. Wolfe, Ph... @cwolferesearch
21K Followers 623 Following Director of AI @RebuyEngine • Writer @ Deep (Learning) Focus • PhD @optimalab1 • I make AI understandableAkshay Iyer @akshay_iyerr
247 Followers 1K Following ML for Renewables 🔸Computer Vision @skyspecs 🔸Prev: Robotics @wpi_robotics and Deep Learning @UmassChangr @zoneis_poor
136 Followers 2K FollowingBrando Miranda @BrandoHablando
763 Followers 579 Following CS Ph.D. @Stanford, researching data quality, foundation models, and ML for Theorem Proving. Prev: @MIT, @MIT_CBMM, @IllinoisCS, @IBM. Opinions are mine. 🇲🇽Shubham Agarwal @shubhamag1992
241 Followers 996 Following Human | PhD | He/Him @Mila_Quebec | @ServiceNowRSRCH | @UMontreal AI Researcher | Prev. @AdobeResearch @naverlabseurope @thetrulymadly @iitdaaGhafek Alsaho (غاف.. @ghafek
76 Followers 3K Following CS Student @TUBerlin | Math, philosophy, psychology, computer science and social equality | Berlin-MittePaolo Glorioso @PaoloGlorioso1
30 Followers 119 Following My research focuses on developing theories and frameworks to model physical systems and processes that are currently not well-understood.Amir Sani @amirsani
360 Followers 1K Following Wine, espresso, risk-averse decision modelling, complex systems, innovationIrina Rish @irinarish
9K Followers 994 Following prof UdeM/Mila; Canada Excellence Research Chair; AAI Lab head https://t.co/UzlrC7ZrGF; INCITE project PI https://t.co/0rV7szd7rH; CSO https://t.co/XDhj6MEtUjYann LeCun @ylecun
711K Followers 718 Following Professor at NYU. Chief AI Scientist at Meta. Researcher in AI, Machine Learning, Robotics, etc. ACM Turing Award Laureate.Alex Hernandez-Garcia @alexhdezgcia
2K Followers 1K Following Postdoc at @Mila_Quebec · ML against climate change, ML and comp. neuroscience · Open Science · he/él/il #PalestinianLivesMatter 🍉Mahta 💻🧠 @Mahtaao
1K Followers 591 Following https://t.co/U0KQVw53s1 Ph.D. student in AI @Mila_Quebec, @ppsp_team, @UMontreal ex Physics student @UWaterloo - @DataforGoodWR - and @HexoskinMilaQuebec @Mila_Quebec
31K Followers 561 Following The world's largest academic research center in deep learning — Le plus grand centre de recherche universitaire en apprentissage profond.Google DeepMind @GoogleDeepMind
944K Followers 275 Following We’re a team of scientists, engineers, ethicists and more, committed to solving intelligence, to advance science and benefit humanity.Shahab Bakhtiari @ShahabBakht
3K Followers 1K Following || assistant prof @UMontreal || leading the systems neuroscience and AI lab (SNAIL) || @Mila_Quebec || #NeuroAI || vision and learning in brains and machinesKoustuv Sinha @koustuvsinha
2K Followers 756 Following Research Scientist @MetaAI; PhD from @mcgillu + @Mila_Quebec; I organize ML Reproducibility Challenge (@repro_challenge). I work on Interpretable multimodal MLBenno Krojer @benno_krojer
2K Followers 2K Following PhDing in AI (Vision+Language) @Mila_Quebec and @mcgillu. Vanier Scholar. I try to see my research as an infinite game: I play so I get to continue playingArnav Jain @arnavkj95
340 Followers 1K Following PhD student University of Montréal and Mila Prev. Data & Applied Scientist, Microsoft | IIT KharagpurMohammad Pezeshki @mpezeshki91
1K Followers 237 FollowingBlake Richards @tyrell_turing
15K Followers 2K Following Researcher at @mcgillu combining AI and neuroscience. Also on Bluesky (@tyrellturing.bsky.social) and Mastodon: @[email protected].Arna Ghosh @arna_ghosh
918 Followers 875 Following PhD student @Mila_Quebec & @McGillU, Vanier scholar • 🧠+🤖 grad student• Ex-@RealityLabs @AIatMeta• Believer in Bio-inspired AI • Comedy+Cricket enthusiastMarc G. Bellemare @marcgbellemare
13K Followers 351 Following CSO & co-founder, Reliant AI. Ex RL research lead at Google Brain, DeepMind. Known for Atari 2600 RL benchmark, Distributional RL (MIT Press 2023).Joseph Viviano @josephdviviano
3K Followers 4K Following humanistic technology bretheren @MILA_Quebec & mentor @creativedlab ~ AI for Science ~ ex @deepgenomics & @CAMHResearch, intern @google & @imagia_aiTimothée Lesort @TLesort
995 Followers 695 Following Senior Data Scientist @AIgnostics. DNNs for Pathology. Continual/Transfert Leaning, Generalisation. Prev. Postdoc. @Mila_Quebec with @irinarish, PhD @IP_Paris_Sasha Luccioni, PhD �.. @SashaMTL
19K Followers 4K Following AI & Climate @HuggingFace, Board Member of @WiMLworkshop and @ClimateChangeAI. @techreview 35 Innovators under 35, @TEDTalks speaker. She/her/Dr/ 🦋Hattie Zhou @oh_that_hat
5K Followers 765 Following Finding \hat{y} Give me anonymous feedback: https://t.co/7aBNrpbad8Ashutosh Mehra @ashutoshmehra
1K Followers 5K Following Senior Principal Scientist at Adobe. Working on Acrobat AI Assistant, LLMs, and document ML.@goth @goth600
50K Followers 7K Following VP, Witchcraft and Propaganda @ 𝕏 | Magic @ 21e8 | “tweets from the void” -redactedGeorge @georgejrjrjr
2K Followers 846 Following The timeline vibetimes pipeline to things still more strange and enticing.Vasu Shyam @vasud3vshyam
316 Followers 287 Following Currently working as a machine learning researcher at a Silicon Valley startup. Former physics postdoc at Stanford and Branco Weiss fellow.Siva Reddy @sivareddyg
5K Followers 966 Following Assistant Professor @Mila_Quebec @McGillU @ServiceNowRSRCH; Postdoc @StanfordNLP; PhD @EdinburghNLP; Natural Language Processor #NLProcLucas Beyer (bl16) @giffmana
56K Followers 447 Following Researcher (Google DeepMind/Brain in Zürich, ex-RWTH Aachen), Gamer, Hacker, Belgian. Mostly gave up trying mastodon as [email protected]Yury Tokpanov @yury_tokpanov
53 Followers 58 FollowingPaolo Glorioso @PaoloGlorioso1
30 Followers 119 Following My research focuses on developing theories and frameworks to model physical systems and processes that are currently not well-understood.catid (e/acc) @MrCatid
3K Followers 633 Following Engineer at Juice Labs. Prior: Anduril, Oculus VR, Game Closure, MSEE@GATechAlessandro Sordoni @murefil
719 Followers 850 Following Researcher at Microsoft Research, Montréal. Views are my own.J. Lyle Kim @jlylekim
162 Followers 292 Following Ph.D. student @Optimalab1, Rice University🦉| Previously research intern @MetaAI | BA @UChicago | Optimization, machine learning, quantum computing | 🇰🇷Cameron R. Wolfe, Ph... @cwolferesearch
21K Followers 623 Following Director of AI @RebuyEngine • Writer @ Deep (Learning) Focus • PhD @optimalab1 • I make AI understandableArthur Douillard @Ar_Douillard
3K Followers 2K Following Modular & Distributed Learning @ DeepMind, Continual Learning PhD @ SorbonneBrando Miranda @BrandoHablando
763 Followers 579 Following CS Ph.D. @Stanford, researching data quality, foundation models, and ML for Theorem Proving. Prev: @MIT, @MIT_CBMM, @IllinoisCS, @IBM. Opinions are mine. 🇲🇽Shubham Agarwal @shubhamag1992
241 Followers 996 Following Human | PhD | He/Him @Mila_Quebec | @ServiceNowRSRCH | @UMontreal AI Researcher | Prev. @AdobeResearch @naverlabseurope @thetrulymadly @iitdaaRobert Scoble @Scobleizer
504K Followers 68K Following Follow me on my new podcast with AI startups, Unaligned. Tech industry color commentator since 1993. Author/Blogger. Former strategist @Microsoft.Andrei Mircea @mirandrom
53 Followers 301 Following PhD student @Mila_Quebec ⊗ mechanistic interpretability + systematic generalization + LLMs for science ⊗ https://t.co/xg8aE8CoWvYixin Lin @yixin_lin_
518 Followers 2K Following Robot learning @GoogleDeepMind, prev FAIR/@AIatMeta, Google Brain. dabbled in startups/investing @Contrary, @KleinerPerkins.Antoine Moulin @antoine_mln
339 Followers 397 Following PhD student in RL @DTIC_UPF, @ELLISforEurope.David Marx || digthat.. @DigThatData
4K Followers 2K Following Generative AI MLE, FOSS toolmaker, innovation catalyst @CoreWeave + @AiEleuther. AI enhanced creativity, philosophy of mind/science/probabilityHailey Schoelkopf @haileysch__
3K Followers 813 Following she/her | research scientist @aiEleuther | LLM training/infra, eval, data | LM Evaluation Harness maintainerQuentin Anthony @QuentinAnthon15
997 Followers 129 Following I make models more efficient. Google Scholar: https://t.co/kzVsAKPdrpJaehong Yoon @jaeh0ng_yoon
536 Followers 678 Following Postdoc @unccs & @uncnlp, working w/ @mohitban47 | PhD @MLAI_KAIST | Interned @MSFTResearch | Video & Multimodal LLM, Efficiency, Compositionality, FaithfulnessAndreas Kirsch 🇮�.. @BlackHC
9K Followers 5K Following Past: 🧑🎓 DPhil @AIMS_oxford @ExeterCollegeOx @UniofOxford (4.5yr) 🧙♂️ RE @DeepMind (1yr) 📺 SWE @Google (3yrs) 🎓 @TU_Muenchen 👤 Fellow @nwspkProf. Chuixiang (Tree.. @cyi12
16K Followers 17K Following Earth resilience, tipping behavior, nonlinear thinking, stability analysis, climate change, photosynthesis, soil respiration, tree mortality, Fulbright ScholarZiyang Luo @ChiYeung_Law
586 Followers 2K Following 👨💻CS PhD Candidate @HKBU_NLP 🇭🇰 📘Research on Code Intelligence and LLM/LMMs. 📖Ex Study @UU_University 🇸🇪 💡Ex Intern @MSFTResearch & @WizardLM_AIVeronica Chelu @VeronicaChelu
724 Followers 2K Following PhD reinforcement learning, optimization, neuroscience, psychedelics @mcgillu @rllabmcgill @MILAMontreal | @IVADO_Qc Scholar | Past @GoogleDeepMind London&MtrlHelen Zhang @HelennnnnnZhang
68 Followers 425 Following PhD student at Mila/UdeM, previously MSc at UBCXiaoxiao Wang @xiaoxiao_mri
508 Followers 1K Following Research associate professor major in Neuroimaging, Visual neuroscience & Deep learning at the University of Science and Technology of ChinaMarek Kwiek @Marek_Kwiek
11K Followers 6K Following Quantitative Science Studies | Global Academic Profession | Full Professor | UNESCO Chair | Academia Europaea | 'Changing European Academics' (Routledge 2019)Cian Eastwood @CianEastwood
596 Followers 610 Following Machine learning PhD student @InfAtEd and @MPI_ISThomas George @tfjgeorge
288 Followers 2K Following Machine learning researcher -- deep learning theory and optimization, weakly supervised learning -- @MINES_ParisTech and @Mila_Quebec alumniHimanshu Maurya @Himanshu_nitrr
339 Followers 4K Following Giving meaning to mine share of star dust. Visiting fellow @WinshipAtEmory. Prev at @oracle, @maddox_ai, @KITKarlsruhe, @_nference, @val_iisc, @iitdelhi.Sanket Vaibhav Mehta,.. @sanketvmehta
686 Followers 1K Following Research Scientist @GoogleAI | Ph.D. @LTIatCMU @SCSatCMU @CarnegieMellon | Past @AdobeResearch, @IITRoorkeeHervé Jegou @hjegou
850 Followers 75 FollowingJonathan Pilault @J_Pilault
167 Followers 483 Following former intern @GoogleDeepMind PhD @Mila_QuebecMax Puelma Touzel @mptouzel
562 Followers 1K Following undisciplined data science of human behav.• statistical inference & decision-making/RL• AI for social good via socialscience• @Mila_Quebec• https://t.co/wJpI6GPNDFR. Michael Alvarez @rmichaelalvarez
5K Followers 5K Following Prof at Caltech. Founding co-director, Caltech Center for Science, Society, and Public Policy. Information, sustainability, and computational social science.Hessie Jones @hessiejones
6K Followers 4K Following Advocating for #DataPrivacy, Human-Centred #AI, Fair & Ethical Distribution 4 all; @forbes she/her; Data Privacy Solutions https://t.co/NVOvYCzPv9, Women in AI EthicsSahar Shayegan @sahar_shayegan
51 Followers 172 Following Research Master's @Mila_Quebec and @McGillU | Network Science, Computational Social Science, NLPDonald Shenaj @DonaldShenaj
38 Followers 142 Following Ph.D. Student @UniPadova | Previously @Mila_Quebec @ConcordiaXin Eric Wang @xwang_lk
7K Followers 1K Following Multimodal and Embodied AI Researcher / Professor @UCSC. Director of https://t.co/Y4swOBag21. AI for Humanity in the long run. he/himRachade @rach_ade
701 Followers 569 Following SeniorCoodo CERC in Autonomous AI @Mila_Quebec #AGI #STEMinist #WomenInTech #Entrepreneur #BioinformaticsDongyan Lin @dongyanl1n
513 Followers 367 Following PhD candidate @IPNmcgill @Mila_Quebec. Neuro-inspired AI + AI-accelerated Neuro. 🐘:[email protected] / 🟦:https://t.co/6Z0ZkOqAE2Shima Rastegarnia @shrastegar
389 Followers 571 Following Computer science @UMontrealDIRO | working with CNeuromod team | ML & DL, brain decoding , fMRI, art | She/her@QuentinAnthon15 @BerenMillidge @yury_tokpanov Congratulations everyone, this looks really nice!
@QuentinAnthon15 @BerenMillidge @yury_tokpanov Congrats!
@QuentinAnthon15 impressive preliminary results. Very delighted to see such a promising open-source model, and timely for all the investigations into hybrid archs - looking forward to read more.
"first time a new arch is trained on the same 2t tokens as llama" isn't that because nobody else can train on Llama data? I hope we don't get a wave of arch papers trained on different proprietary datasets. makes it impossible to compare evals btwn this and ex Griffin.
This is the first time we see a new architecture making🍎to🍎 comparison at scale with Llama-7B trained on the same 2T tokens and win (unlimited context length, lower ppl, constant kv at inference, ...)! Very excited to be part of the team! Thanks for the lead @violet_zct…
@ai_phd Great work! Yes, it has been so long. That was my first TA experience.
@QuentinAnthon15 huge congrats to you and the team!
We trained this model on only 1T publicly accessible tokens using only 128 H100s over 30 days, showing that highly competitive models at this scale are not just the preserve of the largest players but can be achieved even by small teams and limited compute resources.
Our model is fresh from the oven over the weekend, and we are currently working on a huggingface integration. Weights and checkpoints will drop shortly.
We found this significantly boosted our model's quality, raising it from Llama2 level to nearing the state-of-the-art models at this scale. We plan to release both the main annealed model and the original base model for comparison and scientific study.
Like many recent models, we performed a two-phase training scheme with a standard pretraining phase on 1T tokens of open web datasets followed by an annealing phase on higher quality data accompanied by a rapid LR decay.
We find that a single shared self-attention layer appears able to counteract Mamba's weaknesses in in-context-learning, as evidenced by our MMLU scores, while maintaining the inference efficiency of our Mamba backbone.
While MoEs trade greater memory cost for reduced FLOPs, GPU memory is the key constraint for many to run models locally. With Zamba, we experimented with going in the other direction -- sharing global attention parameters to boost performance at the same parameter count.
Extremely excited to announce Zamba! A 7B SSM with a novel architecture competitive with Gemma-7B and Mistral-7B and significantly beating Llama2-7B trained on only 1T open training tokens.
Zyphra is pleased to announce Zamba-7B: - 7B Mamba/Attention hybrid - Competitive with Mistral-7B and Gemma-7B on only 1T fully open training tokens - Outperforms Llama-2 7B and OLMo-7B - All checkpoints across training to be released (Apache 2.0) - Achieved by 7 people, on 128…
@QuentinAnthon15 really interesting work, looking forward to see the tech report soon
@QuentinAnthon15 congrats on the release! looks great 🤩
Zyphra's engineering mission is to put personalized models on your device. Model weights should be private and personalized to your needs. Zyphra believes that a giant monolithic model can't personalize to everyone on the planet.
While we're talking about releases, expect the model weights, checkpoints, training data, etc next week. We're currently cranking away getting our novel architecture into HuggingFace!
We performed a two-phase training approach initially using lower-quality web-data followed by high quality datasets. All of this data was open-source. We collected no data ourselves, and were careful about contamination. We'll be releasing a subsequent paper providing all these…
Why hybrid? On the modeling side, the shared attention block resolves Mamba's shortcomings on in-context learning, as evidenced by Zamba's strong performance on MMLU. On the efficiency side, Mamba blocks are more efficient than MLP + Attention layers, so our model requires fewer…