Samuel L Smith @SamuelMLSmith
Research Scientist at DeepMind. Optimization and Initialization. Formerly Google Brain. Ex-Physicist. Joined January 2021-
Tweets180
-
Followers2K
-
Following361
-
Likes261
Google presents RecurrentGemma Moving Past Transformers for Efficient Open Language Models We introduce RecurrentGemma, an open language model which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent
A couple of things to be aware of when fine-tuning RecurrentGemma/Griffin (which we forgot to mention in our Griffin paper): ➡️We do not use weight decay on the recurrent (RG-LRU) params ➡️For stability, we clip the derivative of our square root ops to a max value of 1000
A couple of things to be aware of when fine-tuning RecurrentGemma/Griffin (which we forgot to mention in our Griffin paper): ➡️We do not use weight decay on the recurrent (RG-LRU) params ➡️For stability, we clip the derivative of our square root ops to a max value of 1000
Check out our newest model - CodeGemma on the block: huggingface.co/blog/codegemma
Our usually compression-centric team helped with the C++ implementation. Gemma runs on the Highway library originally built for HighwayHash and developed further and opensourced in the JPEG XL effort.
Our usually compression-centric team helped with the C++ implementation. Gemma runs on the Highway library originally built for HighwayHash and developed further and opensourced in the JPEG XL effort.
Soham & co have been doing a lot of cool stuff in the architecture/optimizer space! Really neat mix of theory & practice.
Soham & co have been doing a lot of cool stuff in the architecture/optimizer space! Really neat mix of theory & practice.
📢 I am looking for a student researcher to work with me and my colleagues at Google DeepMind London on understanding & building new neural network architectures. Please reach out to me ([email protected]) and apply below before Mar 22 if interested! google.com/about/careers/…
Always wonder how to scale an RNN? Spoiler alert: simple ideas that scale and attention to details.
Always wonder how to scale an RNN? Spoiler alert: simple ideas that scale and attention to details.
😍 the important things
Just got back from vacation, and super excited to finally release Griffin - a new hybrid LLM mixing RNN layers with Local Attention - scaled up to 14B params! arxiv.org/abs/2402.19427 My co-authors have already posted about our amazing results, so here's a 🧵on how we got there!
When I was younger, I always worshiped people who could find the right solution to a difficult problem. But now I've realized that it is so much harder and so much more important to find the right problem to work on, even if you cannot find a solution yet.
When @sharadvikram started Pallas, his dream was to never have to write a kernel in C++ again. Together with the gembros, we finally achieved that (internally). Pallas now lets you define manual TPU pipelines in Python, and compose them. For example, we’ve been able to…
From the community's reaction to the Griffin paper, most people are unaware of how long it takes to publish an LLM paper at Google. We already had most of the results in the Griffin paper, including the final model, most of the writeup before I left in September.
Lucas Beyer (bl16) @giffmana
56K Followers 446 Following Researcher (Google DeepMind/Brain in Zürich, ex-RWTH Aachen), Gamer, Hacker, Belgian. Mostly gave up trying mastodon as [email protected]Behnam Neyshabur @bneyshabur
18K Followers 690 Following Senior Staff Research Scientist @GoogleDeepMind, Interested in reasoning w. LLMs, traveling & backpackingSander Dieleman @sedielem
50K Followers 2K Following Research Scientist at Google DeepMind. I tweet about deep learning (research + software), music, generative models (personal account).rohan anil @_arohan_
12K Followers 2K Following Principal Engineer, @GoogleDeepMind Gemini. prev PaLM-2. Tinkering with optimization and distributed systems. opinions are my own.Dan Roy @roydanroy
45K Followers 2K Following ML / AI researcher, emphasis on theory. Research Director and Canada CIFAR AI Chair, @VectorInst Professor, @UofT (Statistics/CS)Jascha Sohl-Dickstein @jaschasd
19K Followers 625 Following Member of the technical staff @ Anthropic. Most (in)famous for inventing diffusion models. AI + physics + neuroscience + dynamics.Eric Jang @ericjang11
69K Followers 3K Following physical AGI at 1X. Author of "AI is Good for You" https://t.co/eFg4WXhg0pTom Goldstein @tomgoldsteincs
23K Followers 2K Following Professor at UMD. AI security & privacy, algorithmic bias, foundations of ML. Follow me for commentary on state-of-the-art AI.Sara Hooker @sarahookr
39K Followers 7K Following I lead @CohereForAI. Formerly Research @Google Brain @GoogleDeepmind. ML Efficiency at scale, LLMs, @trustworthy_ml. Changing spaces where breakthroughs happen.Ross Wightman @wightmanr
18K Followers 1K Following Computer Vision @ 🤗. Ex head of Software, Firmware Engineering at a Canadian 🦄. Currently building ML, AI systems or investing in startups that do it better.Jeremy Cohen @deepcohen
4K Followers 869 Following PhD student in machine learning at Carnegie Mellon. The goal of my research is to turn deep learning into a real engineering discipline.Jeremy Howard @jeremyphoward
222K Followers 5K Following 🇦🇺 Co-founder: @AnswerDotAI & @FastDotAI ; Hon Professor: @UQSchoolITEE ; Digital Fellow: @StanfordJeff Dean (@🏡) @JeffDean
296K Followers 6K Following Chief Scientist, Google DeepMind and Google Research. Co-designer/implementor of things like @TensorFlow, MapReduce, Bigtable, Spanner, Gemini .. (he/him)Greg Yang @TheGregYang
53K Followers 661 Following Cofounder https://t.co/SpHbO7FZNV. Morgan Prize Honorable Mention 2018. Developing the theory of #TensorPrograms and the practice of scaling #neuralnetworks.Stanislav Fort ✨�.. @stanislavfort
10K Followers 6K Following AI @GoogleDeepMind | Stanford PhD in AI & Cambridge physics | ex-{Anthropic, Stability, Google Brain} | techno-optimism+alignment+progress+growth 🇺🇸🇨🇿Zachary Nado @zacharynado
5K Followers 648 Following Research engineer @googlebrain. Past: software intern @SpaceX, ugrad researcher in @tserre lab @BrownUniversity. All opinions my own.Jason Lee @jasondeanlee
10K Followers 3K Following Associate Professor at Princeton and Research Scientist at Google DeepMind. ML/AI Researcher working on foundations of LLMs and deep learningAnneckxhs @anneckxhs
58 Followers 2K Following Don't stop learning. No matter what age Vienna Austria 🇦🇹 London England 🇬🇧Weaviate • vector d.. @weaviate_io
12K Followers 3K Following The easiest way to build and scale AI applications. 🐙 https://t.co/9ZP8iC4iFd 📰 https://t.co/XiFW3Ks5fKVikram Dutt @vd_
838 Followers 7K FollowingYoshinari Fujinuma @akkikiki
973 Followers 1K Following Applied Scientist@AWS AI Labs; CS PhD @CUBoulder; Tweets are my own; Substack: https://t.co/Mq5oR2vaGN Lived: 🇹🇭🇯🇵🇫🇷🇺🇸 Tweets: JA/EN... @dercrazypug
55 Followers 145 FollowingPensé FFun @inftyCategory
97 Followers 6K FollowingLilla Wainio @LillaWaini10707
72 Followers 5K Following峰 @fngfng93312237
42 Followers 1K FollowingAbcd @Abcd132121
232 Followers 2K FollowingArif Ahmad @arif_ahmad_py
280 Followers 7K Following All things AI, Computer Science and Circuits! Prev. @GoogleAIHana Santmyer @ha_santmy
77 Followers 5K FollowingOlivier Bachem @OlivierBachem
3K Followers 305 Following Senior Staff Research Scientist at @GoogleDeepMind where I lead the team that built the RLHF technology used in Bard, PaLM 2, Gemini, and other Google products.Irving Munoz @jblgotti
9 Followers 376 Following伸也 @brucele94521910
0 Followers 349 FollowingMatteo Olivato @mttlvt93
17 Followers 116 FollowingDylan Neve @DylanNeve10
2 Followers 38 FollowingBala @rbs100
329 Followers 3K Following Technopreneur, DWH/BI Expert,Software Product https://t.co/POKTSVNShK Product owner.Built 2 small products.Dreamt of building Large AI product is built by others now..Tyesha Feltus @felt_tyesh
82 Followers 5K FollowingZhengyao Jiang @zhengyaojiang
1K Followers 267 Following Cofounder and CTO @WecoAI, building AutoML Agents. Final year PhD student at UCL @UCL_DARK @ai_ucl. (Zheng=j-uhng, j as in job; yao=y-aoww)Anil Kommineni @anilkommineni
240 Followers 3K Following Engineering Leader @ Infinity Learn : loving husband/dad and having fun build inspirational Software Experiences/Teams to bring in Learner Delight!!ParallaxEffect @ParallaxAngle
69 Followers 291 FollowingTechjaures @techjaures
17 Followers 285 FollowingNikita Saxena (she/he.. @nikitasaxena02
280 Followers 612 Following Predoc@Google | Mila | BITS Pilanizrait @zrait
162 Followers 2K FollowingFahad @fahadullaah
43 Followers 597 Following Sophomore @IITKanpur | AGI and QAI enthusiast | Uncensored: @theleopardkhanKevin Nejad @kevin_nejad
292 Followers 2K Following PhD ML & Comp.Neuro @UniofOxford prev:@UofBristol, prev. @nyuniversity, MSc Applied Maths @EdinburghUni ,BSc CompSci @KingsCollegeLonSam Power @sp_monte_carlo
17K Followers 7K Following Lecturer in Maths & Stats at Bristol. Interested in probabilistic + numerical computation, statistical modelling + inference. (he / him)AI World News🤖💕 @TheWorldNews
56K Followers 9K Following The first rule of AI club is, We Don't Talk About AI Club! The second rule of AI Club is......biscotte wong @biscottew
8 Followers 51 FollowingNavid Pour @navidkpr
472 Followers 490 Following navid-700t-instruct | Building @cursor_AI | Prev @amazon & Built https://t.co/7gBfa87F7YFANVince @FANVince
76 Followers 1K FollowingOpenNLPLab @opennlplab
260 Followers 87 Following OpenNLPLab Official Account Hugging Face: https://t.co/B9IzcQoCQP GitHub: https://t.co/PhoPmAkyf7 WeChat: OpenNLPLabHuyen Khanh Vo @sungoannn
18 Followers 107 Following incoming PhD student | formerly research resident @fsoft_aicenter | interested in ml, stats and optimAfroz Mohiuddin @afrozenator
1K Followers 5K Following Research Engineer at Google Brain. Interested in Science, Psychology, Investing, Design and generally almost everything. Good Thoughts, Good Words, Good Deeds.Alexandre Gomes @xandmaga
69 Followers 1K Followingldmiao @ldmiao
99 Followers 1K FollowingRakesh Choudhary @r16academy
0 Followers 7 FollowingAnh Nguyen @NguynTu24128917
379 Followers 2K Following Applied Scientist @ Microsoft AI, GenAI, Phi LLMRory Greig @rorygreig1
649 Followers 4K Following Research Engineer at Google DeepMind, interested in AI Alignment and Complexity Science.Aaditya ; @Aaditya26082004
532 Followers 7K Following CS'26 • Machine Learning • Open-Source • Web Dev. • Algorithms • Jai Shree Krishna 🦚🪈Yann LeCun @ylecun
712K Followers 719 Following Professor at NYU. Chief AI Scientist at Meta. Researcher in AI, Machine Learning, Robotics, etc. ACM Turing Award Laureate.Andrej Karpathy @karpathy
980K Followers 905 Following 🧑🍳. Previously Director of AI @ Tesla, founding team @ OpenAI, CS231n/PhD @ Stanford. I like to train large deep neural nets 🧠🤖💥AK @_akhaliq
310K Followers 3K Following AI research paper tweets, ML @Gradio (acq. by @HuggingFace 🤗) dm for promo follow on Hugging Face: https://t.co/q2Qoey80GxGoogle DeepMind @GoogleDeepMind
944K Followers 275 Following We’re a team of scientists, engineers, ethicists and more, committed to solving intelligence, to advance science and benefit humanity.Lucas Beyer (bl16) @giffmana
56K Followers 446 Following Researcher (Google DeepMind/Brain in Zürich, ex-RWTH Aachen), Gamer, Hacker, Belgian. Mostly gave up trying mastodon as [email protected]François Chollet @fchollet
470K Followers 769 Following Deep learning @google. Creator of Keras. Author of 'Deep Learning with Python'. Opinions are my own.Sebastian Raschka @rasbt
267K Followers 906 Following Machine learning & AI researcher writing at https://t.co/A0tXWzG1p5. LLM research engineer @LightningAI. Previously stats professor at UW-Madison.Behnam Neyshabur @bneyshabur
18K Followers 690 Following Senior Staff Research Scientist @GoogleDeepMind, Interested in reasoning w. LLMs, traveling & backpackingSander Dieleman @sedielem
50K Followers 2K Following Research Scientist at Google DeepMind. I tweet about deep learning (research + software), music, generative models (personal account).rohan anil @_arohan_
12K Followers 2K Following Principal Engineer, @GoogleDeepMind Gemini. prev PaLM-2. Tinkering with optimization and distributed systems. opinions are my own.Kevin Patrick Murphy @sirbayes
42K Followers 334 Following Research Scientist at Google Brain / Deepmind. Interested in Bayesian Machine Learning.NeurIPS Conference @NeurIPSConf
112K Followers 35 Following New Orleans, Dec 10-16, 23. https://t.co/ga8aOw615g Tweets to this account are not monitored. Please send feedback to [email protected].Dan Roy @roydanroy
45K Followers 2K Following ML / AI researcher, emphasis on theory. Research Director and Canada CIFAR AI Chair, @VectorInst Professor, @UofT (Statistics/CS)Gautam Kamath @thegautamkamath
44K Followers 507 Following Assistant Prof of CS @UWaterloo, Faculty @VectorInst, Canada @CIFAR_News AI Chair. Co-EiC @TmlrOrg. I lead @TheSalonML. Privacy, robustness, machine learning.François Fleuret @francoisfleuret
31K Followers 456 Following Prof. @Unige_en, Adjunct Prof. @EPFL_en, Research Fellow @idiap_ch, co-founder @nc_shape. AI and machine learning since 1994. I like reality.Jascha Sohl-Dickstein @jaschasd
19K Followers 625 Following Member of the technical staff @ Anthropic. Most (in)famous for inventing diffusion models. AI + physics + neuroscience + dynamics.Sam Power @sp_monte_carlo
17K Followers 7K Following Lecturer in Maths & Stats at Bristol. Interested in probabilistic + numerical computation, statistical modelling + inference. (he / him)Shane Gu @shaneguML
28K Followers 1K Following Research Scientist & Manager @GoogleDeepMind Tokyo/MTV. ex: @GoogleAI Brain, @OpenAI. (JP: @shanegJP)Sholto Douglas @_sholtodouglas
15K Followers 858 Following Scaling Gemini @Deepmind - working towards intelligence too cheap to meterDiego de las Casas @diegolascasas
559 Followers 774 Following Working on LLMs at @MistralAI Past: @GoogleDeepMind 🇧🇷 in 🇬🇧Michal Valko @misovalko
5K Followers 2K Following Llama @AIatMeta Paris & Inria & MVA - Ex: Gemini and BYOL @GoogleDeepMindEthan Caballero is bu.. @ethanCaballero
8K Followers 2K Following ML PhD student @Mila_Quebec ; previously @GoogleDeepMindSharad Vikram @sharadvikram
1K Followers 510 Following Researcher @ Google Deepmind. I work on JAX + Pallas (https://t.co/lPMsq3yzgL) and Gemini. In the past I worked on Oryx and TFP. I like learning.Alex Graveley @alexgraveley
31K Followers 933 Following I’m Alex Graveley, creator of GitHub Copilot, AI Tinkerers, Dropbox Paper, MobileCoin, and Hackpad. Building @ai_minion Hiring https://t.co/nsHar8OLPCEnea Monzio Compagnon.. @EneaMC
42 Followers 74 Following PhD Student in Stochastic Optimization for Deep Learning @ the University of Basel. I smash calculations until it is night. Past: UBS; Yahoo! Research.Sasha Rush @srush_nlp
52K Followers 464 Following Professor, Programmer in NYC. Cornell Tech, Hugging Face 🤗 https://t.co/cZl0wTfqGzclem 🤗 @ClementDelangue
91K Followers 5K Following Co-founder & CEO @HuggingFace 🤗, the open and collaborative platform for AI buildersGeorge Muraru @GeorgeMuraru
240 Followers 566 Following SWE in Research @googledeepmind, Former teaching Assistant @upb1818 Former member of the SMPC Team @openminedorg, Opinions are my ownTogether AI @togethercompute
27K Followers 304 Following The future of AI is open-source. Let's build together.Olivier Hénaff @olivierhenaff
2K Followers 229 Following Staff Research Scientist @GoogleDeepMind, interested in active, multimodal, and memory-augmented learning. Formerly @NYU_CNS and @PolytechniqueNeil Houlsby @neilhoulsby
4K Followers 318 Following Professional AI researcher; amateur athlete. Senior Staff RS in the Google Deepmind, Zürich. Attempts triathlons.finbarr @finbarrtimbers
8K Followers 645 Following large models @midjourney. ai hot takes at https://t.co/pSeuTpK0xO.Andreas Kirsch 🇮�.. @BlackHC
9K Followers 5K Following Past: 🧑🎓 DPhil @AIMS_oxford @ExeterCollegeOx @UniofOxford (4.5yr) 🧙♂️ RE @DeepMind (1yr) 📺 SWE @Google (3yrs) 🎓 @TU_Muenchen 👤 Fellow @nwspkTom Rainforth @tom_rainforth
5K Followers 304 Following Senior research fellow (faculty) in machine learning at the University of Oxford, Head of RainML Research Lab (https://t.co/uuBwQ2WdMN)Ofir Nachum @ofirnachum
4K Followers 343 Following Research at @OpenAI. Previously at @GoogleAI on the Brain Team. Doing work on #ReinforcementLearning and #MachineLearningSherjil Ozair @sherjilozair
6K Followers 3K Following prev: autopilot @tesla, deep learning @googledeepmind, phd https://t.co/dxgb6gimCf, cs @iitdelhiJonas Geiping @jonasgeiping
2K Followers 612 Following Machine Learning Research at the ELLIS Institute & MPI-IS // Investigating fundamental questions in Safety, Security, Privacy & Efficiency of modern MLCaglar Gulcehre @caglarml
4K Followers 1K Following ML Researcher Prof @ EPFL, PI @ CLAIRE lab Ex: Staff Research Scientist @ Deepmind, MSR, IBM Research Follow me on Mastodon: https://t.co/LZ5sWt7AsjTri Dao @tri_dao
19K Followers 365 Following Incoming Asst. Prof @PrincetonCS, Chief Scientist @togethercompute. Machine learning & systems.Toby Pohlen @TobyPhln
26K Followers 454 Following Founding member @xAI. Previously @GoogleDeepMind. @RWTH alumnus.Alan Karthikesalingam @alan_karthi
5K Followers 2K Following Health AI @GoogleHealth @GoogleAI @GoogleDeepMind including ✨Med-Gemini, AMIE, MedPaLM, MedPaLM-2, MedPaLM-M, CoDoC Hon Lecturer Vasc Surgery @ImperialVascPolina Kirichenko @polkirichenko
3K Followers 1K Following PhD student at New York University, Visiting Researcher at @MetaAI FAIR Labs 🇺🇦Jiaxin Shi @thjashin
2K Followers 316 Following Research Scientist @GoogleDeepMind | prev @Stanford @MSRNE @VectorInst @RIKEN_AIP_EN @Tsinghua_Uni. Building probabilistic & algorithmic models for learning.Timothee Lacroix @tlacroix6
4K Followers 7 Following PhD student at Facebook AI Research Paris. Interested in link prediction in various settings and at various scales.Arthur Mensch @arthurmensch
40K Followers 874 Following Co-founder and CEO @MistralAI. Apply https://t.co/yHGRZAtjcxOpenAI @OpenAI
3.5M Followers 0 Following OpenAI’s mission is to ensure that artificial general intelligence benefits all of humanity. We’re hiring: https://t.co/dJGr6LgzPARichard Sutton @RichardSSutton
26K Followers 37 Following Student of mind and nature, libertarian, chess player, cancer survivor. @ Keen Technologies, UAlberta, Amii, RLAI, The Royal Society, RichSutton.ethAntonio Orvieto @orvieto_antonio
1K Followers 1K Following Deep Learning PI @ELLISInst_Tue, Group Leader @MPI_IS. I compute stuff with lots of gradients 🧮, I like Kierkegaard & Lévi-Strauss 🧙♂️BlinkDL @BlinkDL_AI
7K Followers 90 Following RWKV = 100% RNN with GPT-level performance. https://t.co/TkdxOJSFWX and https://t.co/86DzS6arA0Andrew Foong @AndrewFoongYK
1K Followers 232 Following Senior researcher @MSFTResearch, working on deep learning for molecular modelling. PhD @ Cambridge Uni, Ex-intern @DeepMind.John Schulman @johnschulman2
39K Followers 611 Following Cofounder @openai, lead post-training for ChatGPT and the API. Interested in reinforcement learning, alignment, birds, jazz musicEric Nguyen @exnx
2K Followers 326 Following PhD in BioEngineering & AI @stanford @HazyResearch @StanfordAILab @arcinstituteMichael and Jie did an amazing job on their first PhD project, by finding and fixing common pitfalls in empirical ML privacy evaluations. It turns out, if you evaluate things properly, DP-SGD is also the best *heuristic* defense when you instantiate it with large epsilon values.
Heuristic privacy defenses claim to outperform DP-SGD in real-world settings. With no guarantees, can we trust them? We find that existing evaluations can underestimate privacy leakage by orders of magnitude! Surprisingly, high-accuracy DP-SGD (ϵ >> 1000) still wins. 🧵
Our Next Generation Sequence Modeling Architectures workshop proposal was accepted by ICML! We have an incredible lineup of speakers, please come say hi and consider submitting your works! :)
Feeling very fortunate to co-organize this workshop with an incredible group of researchers, Razvan Pascanu, @orvieto_antonio, Carmen Amo Alonso, and Maciej Wołczyk!
We will have an amazing line of speakers, including @_albertgu, @scychan_brains, @sohamde_, @HochreiterSepp, and many more exciting speakers! Covering a range of topics on developing the next generation of sequence models has been a topic near my heart for more than 10 years🙂.
I am honored to announce that we will organize the "Next Generation of Sequence Modeling Architectures" workshop at ICML 2024 this year in Vienna. The workshop will take place on Friday, the 26th of July. The workshop page is: sites.google.com/view/ngsmworks…
Sara took for.ai from 20 or 30 people to a community of thousands. I feel so lucky to get to work with her each day.
Best ML advice you can get
@deliprao None the chinchilla paper said „hey we’re training too large models for too short! On the same compute, it would be better if we trained smaller models longer“ It was a bit conservative in its message, which was a contrarian message at the time!
@SamuelMLSmith @srush_nlp So glad I got this lesson early and very clearly. Figure from our BiT paper:
Combining SSM/RNN/EMA with attention is the way to higher quality, longer context, and faster inference! Griffin, Jamba, Zamba, and now Megalodon are great examples
How to enjoy the best of both worlds of efficient training (less communication and computation) and inference (constant KV-cache)? We introduce a new efficient architecture for long-context modeling – Megalodon that supports unlimited context length. In a controlled head-to-head…
Google presents RecurrentGemma Moving Past Transformers for Efficient Open Language Models We introduce RecurrentGemma, an open language model which uses Google's novel Griffin architecture. Griffin combines linear recurrences with local attention to achieve excellent
A couple of things to be aware of when fine-tuning RecurrentGemma/Griffin (which we forgot to mention in our Griffin paper): ➡️We do not use weight decay on the recurrent (RG-LRU) params ➡️For stability, we clip the derivative of our square root ops to a max value of 1000
Releasing RecurrentGemma - one of the strongest 2B-param open models designed for fast inference on long sequences and massive throughput! Both pre-trained and IT checkpoints available + code - try them out here! Code: github.com/google-deepmin… Weights: kaggle.com/models/google/…
Yesterday we released Gemma Instruct 1.1, a much-improved version of the 2B & 7B instruction-tuned Gemma models. Try them out! kaggle.com/models/keras/g…
We've recently benchmarked the Needle in a Haystack of RecurrentGemma 2B and its instruction version @Google @SamuelMLSmith using this resource: github.com/XuyangShen/LLM…. The results achieve our expectations, showcasing a mix efficiency model architecture that combines window…
Check out our newest model - CodeGemma on the block: huggingface.co/blog/codegemma
Our usually compression-centric team helped with the C++ implementation. Gemma runs on the Highway library originally built for HighwayHash and developed further and opensourced in the JPEG XL effort.
Announcing RecurrentGemma! github.com/google-deepmin… - A 2B model with open weights based on Griffin - Replaces transformer with mix of gated linear recurrences and local attention - Competitive with Gemma-2B on downstream evals - Higher throughput when sampling long sequences
Yay 🎉! I am really happy that the 2B version of our Griffin architecture is finally open-source! Great work and job the whole team for doing this. I have already started playing with it in colab ☺️
Following our previous work, we are releasing RecurrentGemma - a fully open source 2B model based on our Griffin architecutre! Code + weights as everyone has wished for! Code on Github: github.com/google-deepmin… Weights on Kaggle: kaggle.com/models/google/…
The new RecurrentGemma model looks really nice. Similar accuracy to Gemma-2b and very long scale. huggingface.co/google/recurre… If you are interested in how it works, we did cover the details in "A Mamba Primer". youtube.com/watch?v=dVH1dR…
Thanks a lot to @Cyber_Valley for these amazing shots! Had lots of fun 🧙♂️. Spoiler: amazing guests in the next episodes (it's a crescendo) ;) #AI
🚀 Get ready to dive deep into the captivating world of artificial intelligence with us! The Cyber Valley Podcast coming soon... 🎙️ Don’t miss our unforgettable episodes, created in collaboration with the ELLIS Institute Tübingen #AIPodcast #AIResearch #ELLIS #AI @orvieto_antonio
These are the faces of people that open source 🤗 Amazing to be with @jefrankle from Databricks, @sarahookr from Cohere, @sophiamyang from Mistral, @dvilasuero from Argilla, and @_lewtun from Hugging Face