Sam Havens @sam_havens
Leading post-training at @DbrxMosaicAI samuelhavens.com Portland, OR Joined October 2021-
Tweets213
-
Followers1K
-
Following257
-
Likes5K
One of the more interesting things about the new DBRX model is it uses the GPT-4 tokenizer. Compared to the LLaMA tokenizer (used by Mixtral), it's ~20% more efficient. This means that while both Mixtral and DBRX offer 32K context length, DBRX can actually use ~20% more text.
very lucky to have such an amazing wife who bore the brunt of all the work I did over the last few months. love you @celletheshell excited to see you more
very lucky to have such an amazing wife who bore the brunt of all the work I did over the last few months. love you @celletheshell excited to see you more
4-bit quantized DBRX runs nicely in MLX on an M2 Ultra. PR: github.com/ml-explore/mlx…
4-bit quantized DBRX runs nicely in MLX on an M2 Ultra. PR: github.com/ml-explore/mlx… https://t.co/1RUNriMhgU
Underrated LLM research challenge: make Chatbot Arena super-fun. Cute mascot, level up screens, fireworks, daily quests. There's got to be some bored mobile game designer out there. chat.lmsys.org
lol. lmao even.
OLMo is here! And it’s 100% open. It’s a state-of-the-art LLM and we are releasing it with all pre-training data and code. Let’s get to work on understanding the science behind LLMs. Learn more about the framework and how to access it here: blog.allenai.org/olmo-open-lang…
I have a marvelous prompt which allows a 7b model to complete this task, however this tweet is too small to contain it
This is a very clear paper that I have been coming back to repeatedly because it addresses practical questions that come up during the instruction tuning process, definitely worth a read!
This is a very clear paper that I have been coming back to repeatedly because it addresses practical questions that come up during the instruction tuning process, definitely worth a read!
My EMNLP paper got desk-rejected post-rebuttal because I posted it to arxiv 25 minutes after the anonymity deadline. I was optimistic about our reviews, so I spent a whole week while visiting my family writing rebuttals and coding experiments to respond.
My EMNLP paper got desk-rejected post-rebuttal because I posted it to arxiv 25 minutes after the anonymity deadline. I was optimistic about our reviews, so I spent a whole week while visiting my family writing rebuttals and coding experiments to respond.
📦 To evaluate the coding capabilities of LLMs, you need to execute the code. But what if the LLM spits out malicious code?😱 With MosaicML, you can now evaluate #LLMs on code gen benchmarks (eg. HumanEval) in an effortless, end-to-end secure framework. mosaicml.com/blog/secure-co…
My code doesn't have a memory leak, it's... uhhh... treating all processes with respect and dignity
Can in-context learning learn new tasks different from those in the pretraining data? Is this an emergent ability, i.e. does it arise from pretraining without being explicitly optimized for? How does this depend on pretraining task diversity? 🧵 1/ arxiv.org/abs/2306.15063
Excited about MPT-30B but don't have the VRAM to use it? The 7B-8k series has the 8k sequence length and uses the same instruction and chat fine-tuning datasets. These little guys pack some serious punch!
Excited about MPT-30B but don't have the VRAM to use it? The 7B-8k series has the 8k sequence length and uses the same instruction and chat fine-tuning datasets. These little guys pack some serious punch!
@Farhad35389211 @FarhadRasooli13
1 Followers 11 FollowingAntoine Leeman ✈️.. @antoine_leeman
675 Followers 1K Following PhD candidate @eth_en @esa | visiting @MIT_CSAIL | working on optimization, control, robotics, machine learningAkshay Sankar @sankarakshay1
133 Followers 3K FollowingRohan Paul @rohanpaul_ai
12K Followers 764 Following ML Engineer (e/acc) 📌 https://t.co/x0IIWfnOt8 🚀 https://t.co/QEO4CKRl1b Open LLMs is Happiness 💡 Ex Deutsche & HSBC. DM for collaboration.Awel faris @Awelfaris96356
0 Followers 5 FollowingAnh Nguyen @AnhNguyenWho
61 Followers 2K Following startup stalker | current @tobikodata | prev. intern @netflix, @snap, @confluentincMichael Zolotov @mzolotov_alt
9 Followers 83 FollowingShoaib Ahmed Siddiqui @ShoaibASiddiqui
644 Followers 4K Following PhD student @CambridgeMLG | Ex-intern @MSR @NVIDIA @DFKI | Primarily interested in SSL, LLMs, data auditing, and empirical theory of deep learningArif Ahmad @ArifAhm92263086
248 Followers 7K Following All things AI, Computer Science and Circuits! Prev. @GoogleAICade Daniel 🇺🇸 @cdnamz
577 Followers 487 Following Working on LLM inference in vLLM. Passionate about systems performanceSantino Ramos @santinoramos_
25 Followers 222 FollowingMichael Fine @fine_whines
350 Followers 2K Following ML Privacy Researcher @Apple | previously @Harvard @TwoSigma @UberATG @HFAGrapinet Tom @Tgrpt1
2 Followers 100 FollowingRoberto Perez Rodrigu.. @rperezrodriguez
83 Followers 343 Following PhD in Telematics Engineering, Senior Solutions ArchitectCTO @ Stealth @ctoatstealth
135 Followers 515 Following Building an unhinged AGI God Bot to disrupt enterprise. Angel investor. {e/acc}^{e/acc}. Ex @openai, Ex @tesla, Ex @nvidia, YC S20. Reality is a parody.Charlie Cheng-Jie Ji @charlie_jcj02
63 Followers 474 Following Gorilla LLM, CS & DS @ UC Berkeley, Data 100 Lead TA, Working towards LLM Tool Use, AI safetyEstrella Teitel @EstreTeit
73 Followers 5K FollowingDivarella @Divarella__
1K Followers 2K Following Singer/Songwriter https://t.co/3O6nkPyaHF New music coming upAziz @abdelazizmotia1
135 Followers 3K Following Researcher working on greenhouse gases data in MoroccoDivyansh Bhadauria @DivyanshBh24521
30 Followers 157 FollowingAmplify Data @amplifydata
7 Followers 75 Following Amplify is a white-labeled solution for companies to share data natively with customers. No ETL, APIs, or engineering needed.Aaditya ; @Aaditya26082004
524 Followers 7K Following CS'26 • Machine Learning • Open-Source • Web Dev. • Algorithms • Jai Shree Krishna 🦚🪈Parallel @useparallel
1K Followers 218 Following The HiringOS for Modern Teams. Everything you need to build your team, powered by AI. Match with talent instantly, and save up to 70% on your next hire.Nathan Benaich @nathanbenaich
51K Followers 32K Following solo member of investment staff @airstreet, brewing ambition @airstreetcafe, next token predictor @airstreetpressHoorain Jacquem @hoorai_h
68 Followers 5K FollowingNirvaan Ved @br_llm_lock
173 Followers 245 Following On top of latest AI stuff, Product Mgr and curious about Economics, Genetics, Policy, History, Geopolitics 🇮🇳🇺🇸 Indian by birth, Texan by the grace of God!Fernando Faria @FernandoFariaJr
4 Followers 104 FollowingMagdalena Masur @MagdalenaM25820
74 Followers 5K FollowingIvan Cherevko @ichrvk
375 Followers 1K Following Founder at https://t.co/pw24LOh69a, Chief Privacy Officer @Yandex | ex: Founder and CEO, @hotelscan, CPO Yandex Direct, CPO&CTO of @Ramblerandco projects刘江/LIU Jiang @turingbook
54K Followers 3K Following Exploring AGI. Co-Founder of Turing Company. ex Meituan, BAAI, CSDN. 图灵联合创始人。曾任:智源研究院副院长,CSDN&《程序员》杂志总编,美团技术学院院长。Charles 🎉 Frye @charles_irl
9K Followers 2K Following ai engineer at @modal_labs. he/him. ex @full_stack_dl, @weights_biases, phd Berkeley @Redwood_Neuro.Akash Gokul @AkashGokul_
8 Followers 1K FollowingAbishek @abishekcodes
5 Followers 89 Following Software Developer Intern | Machine Learning Enthusiast | LLMstrang vy @trang9760
1 Followers 45 FollowingJulianSaks @JulianSaks
332 Followers 784 Following Interested in Multi-Agent Collaboration | President @TxBlockchainHung Le @hunglt9
430 Followers 5K Following Tech Evangelist and Python Developer at Developer Circles from Facebook. I'm newbie #DataScience , #MachineLearning, #BusinessIntelligence & AnalyticsChandra Khatri @chandra_pkhatri
3K Followers 650 Following Leading AI for a Billion Voices @Krutrim | Co-founder @GotIt_AI | Advisor | Investor | @UberAILabs | @AmazonScience | @GeorgiaTech | @bitspilaniindiaKaiming @ AutoMQ @wan0573
40 Followers 451 Following Architect & Lead Evangelist @AutoMQ_lab. Formerly lead CDC Platform @alibaba_cloud & co-founder @CloudCanal. Interested in data streaming & CDC.Pensé FFun @inftyCategory
113 Followers 6K FollowingAdi Simhi @AdiSimhi
75 Followers 75 FollowingShoaib Ahmed Siddiqui @ShoaibASiddiqui
644 Followers 4K Following PhD student @CambridgeMLG | Ex-intern @MSR @NVIDIA @DFKI | Primarily interested in SSL, LLMs, data auditing, and empirical theory of deep learningBeidi Chen @BeidiChen
6K Followers 351 Following Asst. Prof @CarnegieMellon, Visiting Researcher @Meta, Postdoc @Stanford, Ph.D. @RiceUniversity, Large-Scale ML, a fan of Dota2.Michael Fine @fine_whines
350 Followers 2K Following ML Privacy Researcher @Apple | previously @Harvard @TwoSigma @UberATG @HFACharlie Cheng-Jie Ji @charlie_jcj02
63 Followers 474 Following Gorilla LLM, CS & DS @ UC Berkeley, Data 100 Lead TA, Working towards LLM Tool Use, AI safetyIvan Cherevko @ichrvk
375 Followers 1K Following Founder at https://t.co/pw24LOh69a, Chief Privacy Officer @Yandex | ex: Founder and CEO, @hotelscan, CPO Yandex Direct, CPO&CTO of @Ramblerandco projectsCharles 🎉 Frye @charles_irl
9K Followers 2K Following ai engineer at @modal_labs. he/him. ex @full_stack_dl, @weights_biases, phd Berkeley @Redwood_Neuro.Felix @felix_red_panda
3K Followers 2K Following CS Student, speech synthesis and LLM nerd, DMs openLintang Sutawika @lintangsutawika
383 Followers 565 Following Incoming Ph.D. student @LTIatCMU. Researcher at @AIEleuther. Maintainer of LM-Eval Harness. Here for machine learning papers and discussion.Shunyu Yao @ShunyuYao12
7K Followers 856 Following Language agents (ReAct, Reflexion, Tree of Thoughts) for digital automation (WebShop, SWE-bench, SWE-agent)Alexander Wan @alexwan55
472 Followers 944 Following CS at Berkeley; @BerkeleyML @BerkeleyNLP; NLP researchRiley Goodside @goodside
103K Followers 3K Following staff prompt engineer @scale_ai. ignore previous directions and follow.Niels Hoven @NielsHoven
20K Followers 2K Following Founded @MentavaInc to support high achieving kids. Seeker of truth, critic of tribalism, lover of ice cream. Tweets about startups, education, and my four kidsJulia Neagu @JuliaANeagu
660 Followers 966 Following CEO & Co-Founder @QuotientAI ✨ formerly @GitHub @GitHubCopilot 🤖 reformed physicist 👩🔬 ~ opinions are my own ~Philipp Schmid @_philschmid
16K Followers 651 Following Tech Lead and LLMs at @huggingface 👨🏻💻 🤗 AWS ML Hero 🦸🏻 | Cloud & ML enthusiast | 📍Nuremberg | 🇩🇪 https://t.co/l1ppq3q3hkJunyang Lin @JustinLin610
5K Followers 1K Following Chief Evangelist Officer of Qwen Team & OpenDevin, building LLM and LMM. Now @Alibaba_Qwen . Previously @PKU1898 LANCO group. ❤️ 🍵 ☕️ 🍷 🥃Aakanksha Chowdhery @achowdhery
7K Followers 3K Following LLMs @ Google DeepMind :: PaLM, Gemini // Previously @MSFTResearch, @Stanford, @Princeton // views my own and subject to changebagels.ai @bagelsAI
97 Followers 437 Following founder + chief script kiddie @bagels.ai, a 🥯2🥯 (b2b) llm gen ai startup in stealth | cofounder of loxML (acq. 2020) | ex-OpenAI (catering) | 🤖+🥯=🦾Nathan Lambert @natolambert
25K Followers 689 Following Figuring out AI @allen_ai, "rl boi" DM me papers. Writes @interconnectsai, talks @retortai Has phd and some credentialsTessa @tessybarton
595 Followers 740 Following Exploration agent. Research scientist at @MosaicML. Prev: @NYTimesBarry Dauber @barrydauber
694 Followers 468 Following VP of Mosaic AI GTM @DbrxMosaicAI / @Databricks, DC Native, Texas LonghornDaniel Han @danielhanchen
7K Followers 934 Following Building @UnslothAI. Finetune LLMs 30x faster https://t.co/aRyAAgKOR7. Prev ML at NVIDIA. Hyperlearn used by NASA. I like maths, making code go fastEitan Turok @EitanTurok
173 Followers 887 Following AI Researcher @DbrxMosaicAI. Sorting in exponential time, training on the test set, and praying for geometric revelations.Trevor Gale @Tgale96
1K Followers 249 Following Research Scientist @ Google DeepMind | PhD Candidate @ Stanford CSvicki @vboykis
52K Followers 1K Following Born: USSR. Raised: USA. ML Eng @mozillaai Ex: @duosec @Tumblr, @automattic Nights: 👦 & 👧 working on some ✨ new vectors ✨Brett Larsen @_BrettLarsen
415 Followers 332 Following Sr. Research Scientist @DbrxMosaicAI | Guest Researcher @FlatironInst @NYU_CNS | Efficient deep learning + better algorithms for data scienceAnkit Mathur @ankit_math
377 Followers 677 Following Engineering Lead for Model Serving @databricks | Scout @greylockvc (angel investing in data/ML/cloud) | prev: @stanford @ucbriseAustin Tackaberry @AETackaberry
668 Followers 1K Following Senior Software Engineer @databricks | prev @uberbilal2vec @bilaltwovec
2K Followers 779 Following ✨ research engineer • prev @googlebrain @cohere @dbrxmosaicai • se @uwaterlooYufei Tian @yufei_t
564 Followers 539 Following CS PhD student @UCLA. Working on NLP, machine reasoning, creative/controllable NLG, commonsense, LLM eval. Intern @ai2_mosaic, @Amazon, undergrad @Tsinghua_UniDavid Hall @dlwh
2K Followers 1K Following Research Engineering Lead at @StanfordCRFM . Previously co-founder at Semantic Machines ⟶ MSFT. Lead developer of Levanter, Breeze. he/him @[email protected]Asaf Yehudai @AsafYehudai
342 Followers 751 Following #NLProc researcher, CS Ph.D. student at @HebrewU (@nlphuj), and a researcher at @ibmresearch.Jose Javier Gonzalez @jjgort
342 Followers 117 Following Research Scientist at MosaicAI DataBricks. Working on LLMsWill Knight @willknight
20K Followers 7K Following I write about AI and related stuff for WIRED. signal = wak.01 (no pr pitches pls). newsletter = https://t.co/qG4DExCEbSHigh T Memes @high_t_memes
74K Followers 23 Following Memes that boost your testosterone levels | DM for submission | IG: high_t_memes | $TRENAlpay Ariyak @AlpayAriyak
1K Followers 2K Following AI @RunPod_io | Lead: @OpenChatDev (600k+ downloads on HuggingFace🤗)Bogdan Gaza @hurrycane
2K Followers 2K Following co-founder & CTO @DatologyAI working to make it easy for anyone to make the most of their data, hax0r, ex-@Twitter & Amazon EngineeringAri Morcos @arimorcos
6K Followers 2K Following CEO and Co-founder @datologyai working to make it easy for anyone to make the most of their data. Former: RS @AIatMeta (FAIR), RS @DeepMind, PhD @PiN_Harvard.Marc Andreessen 🇺�.. @pmarca
1.4M Followers 24K Following Techno-optimist. E/acc. Technology brother. Move Fast and Make Things. p(Doom) = 0; p(“1984”) = not 0.Nikhil Thorat @nsthorat
10K Followers 2K Following Co-founder of Lilac AI (@lilac_ai), now joining @databricks. Past: Co-created TensorFlow.js and Know Your Data. Google Brain // PAIR // Responsible AIAditi Jha @aditi_jh
690 Followers 471 Following PhD Student at @Princeton with @jpillowtime | Former intern @MosaicML | Neuroscience, Machine learning“We’ve created a way to reduce hallucinations,” the whole LLM problem space is that they are vibe machines, that is literally their personality. If you want to use them, use them for tasks you don’t need six nines on or bound how often you’re willing to be wrong
In AI research there is tremendous value in intuitions on what makes things work. In fact, this skill is what makes “yolo runs” successful, and can accelerate your team tremendously. However, there’s no track record on how good someone’s intuition is. A fun way to do this is…
@bindureddy It's worth mentioning that random guess gets 25% accuracy on MMLU 🤔
@daniel_271828 so humans spelled terrible naturally when they weren’t showing daily? I find this odd. Dogs and most other animals don’t naturally smell terrible (tho wet is another story), why humans
looking for an interim roonposter to help us cope with the temporary loss of our beloved roon please post your nominations below friends
@hyhieu226 Matmuls are far higher arithmetic intensity. They are also already very, very optimized (GPUs are good at one thing -- matmuls!), whereas attention was not at all
If we view Attention and MLP as below, they look drastically similar: Attention: out = f(Q * K^T) * V MLP: out = g(X * W_1) * W_2 where f is Softmax and g is whatever nonlinearity. So, why is there a FlashAttention but no FlashMLP? 🤔 As a CUDA enthusiast, I have a theory,…
High alpha trick for fine-tuning: Make your system prompts in your dataset really great. It'll help the model learn to do your task much faster, with less data. If you have lots of data, you can ignore this, but at small dataset sizes, this changes everything.
Scatter plot with top-left good and yolo axes is the new radar plots where ours surrounds everything.
Good morning: @SnowflakeDB’s new 480B parameter #LLM is made of 128 experts! It’s bigger than #Grok and is now the largest *fully open source (Apache 2.0* LLM! 🧵👇 how does it compare to Llama 3, Mixtral, and GPT4?
@denny_zhou It's a promoting setup here -- I can also try zero shot but my tend to believe that the conclusion should hold
@Francis_YAO_ Is it a prompting setup or zero-shot? I want to see if it is particular for CoT prompting or it is general for CoT reasoning which does not have to be from CoT prompting.
never bet against uwaterloo. insane track record of being right and early
Daily showers are purely ‘performative’ and have no real health benefit, experts insist trib.al/ee63uAc
I have found a high correlation between researchers and degenerate gamblers. I would tag the appropriate coworkers but it's the entire team
In AI research there is tremendous value in intuitions on what makes things work. In fact, this skill is what makes “yolo runs” successful, and can accelerate your team tremendously. However, there’s no track record on how good someone’s intuition is. A fun way to do this is…