Sasha Rush @srush_nlp
Professor, Programmer in NYC. Cornell Tech, Hugging Face 🤗 https://t.co/cZl0wTfqGz rush-nlp.com New York, NY Joined December 2015-
Tweets6K
-
Followers51K
-
Following462
-
Likes3K
llm.cpp was finally published today. It's very much CUDA C++ the good parts. Code: github.com/gevtushenko/ll… Talk: youtube.com/watch?v=WiB_3C… Speakers: x.com/g_evtushenko and Jake Hemstad
Just to be clear, I get what "agentic" means, I was mainly confused as to what it implies as a modifier on a person.
*Alice goes to a differentiable wonderland!* 🔥 I published a short free book on the design of neural networks, from convolutions to transformers, SSMs, and a few other topics. As a bonus, I tried to make it looking nice - any feedback is appreciated! sscardapane.it/alice-book
in the coming weeks me and @ZhengxuanZenWu are giving in-person talks on ReFT at - @Demandbase (SF, 5/1) - @stanfordnlp Lunch (Stanford, 5/2) - @awscloud Generative AI (Santa Clara, 5/10)
Happy to release Accelerated Scan, a kernel library for first order parallel associative scans in vanilla @PyTorch, Triton 2.2.0 and CUDA C++. pip install accelerated-scan🧵
Talk: "OLMo: Findings of Training an Open LM" from Hanna Hajirshizi at AI2 from OSGAI. Extremely interesting overview of the 4 parts (Data, Training, Adaptation, Eval) of the OLMo open LLM project. Rare insight into how these processes work at scale. youtube.com/watch?v=qFZbu2…
@aryaman2020 Let's do it $100 Phi 3 Medium beats LLAMA 3 8B instruct on Arena 🤝
Do models need to reason in words to benefit from chain-of-thought tokens? In our experiments, the answer is no! Models can perform on par with CoT using repeated '...' filler tokens. This raises alignment concerns: Using filler, LMs can do hidden reasoning not visible in CoT🧵
Will your paper catch the eye of @_akhaliq? I built a demo that predicts if AK will select a paper. It has 50% F1 using DeBERTa finetuned on data from past year. As a test, our upcoming WildChat arXiv has a 56% chance. Hopefully not a false positive🤞 🔗huggingface.co/spaces/yuntian…
Never think about x ↦ x - η∇L(x) (gradient descent), even as a simplification. Replace it with x ↦ (1-𝛾)x + 𝛾 argmin_{y∈X} ⟨y,∇L(x)⟩ (Frank-Wolfe; a Mann iteration) or x ↦ (1-λ)x + η argmin_{||𝚫||≤1} ⟨𝚫,∇L(x)⟩ (normalized steepest descent)
Never think about x ↦ x - η∇L(x) (gradient descent), even as a simplification. Replace it with x ↦ (1-𝛾)x + 𝛾 argmin_{y∈X} ⟨y,∇L(x)⟩ (Frank-Wolfe; a Mann iteration) or x ↦ (1-λ)x + η argmin_{||𝚫||≤1} ⟨𝚫,∇L(x)⟩ (normalized steepest descent) https://t.co/JEVWgtVKco
List Items One by One A New Data Source and Learning Paradigm for Multimodal LLMs Set-of-Mark (SoM) Prompting unleashes the visual grounding capability of GPT-4V, by enabling the model to associate visual objects with tags inserted on the image. These tags, marked with
@srush_nlp No benchmarks, only vibes are the way forward
(((ل()(ل() 'yoav))).. @yoavgo
46K Followers 2K FollowingDelip Rao e/σ @deliprao
46K Followers 5K Following Busy inventing the shipwreck. @Penn. Past: @johnshopkins, @UCSC, @Amazon, @Twitter ||Art: #NLProc, Vision, Speech, #DeepLearning || Life: 道元, improv, running 🌈Soumith Chintala @soumithchintala
187K Followers 883 Following Cofounded and lead @PyTorch at Meta. Also dabble in robotics at NYU. AI is delicious when it is accessible and open-source.Percy Liang @percyliang
49K Followers 408 Following Associate Professor in computer science @Stanford @StanfordHAI @StanfordCRFM @StanfordAILab @stanfordnlp | cofounder @togethercompute | PianistKyunghyun Cho @kchonyc
61K Followers 2K Following a combination of a mediocre scientist, a mediocre manager, a mediocre advisor & a mediocre PC at @nyuniversity (@CILVRatNYU) & @genentech (@PrescientDesign).Lucas Beyer (bl16) @giffmana
56K Followers 446 Following Researcher (Google DeepMind/Brain in Zürich, ex-RWTH Aachen), Gamer, Hacker, Belgian. Mostly gave up trying mastodon as [email protected]clem 🤗 @ClementDelangue
91K Followers 5K Following Co-founder & CEO @HuggingFace 🤗, the open and collaborative platform for AI buildersRosanne Liu @savvyRL
33K Followers 968 Following Cofounded & running @ml_collective. Host of Deep Learning Classics & Trends. Research at Google DeepMind. DEI/DIA Chair of ICLR & NeurIPS. Writing https://t.co/IbycyGfnDRSam Bowman @sleepinyourhat
35K Followers 3K Following AI alignment + LLMs at NYU & Anthropic. Views not employers'. No relation to @s8mb. I think you should join @givingwhatwecan.Horace He @cHHillee
24K Followers 449 Following Working at the intersection of ML and Systems @ PyTorch "My learning style is Horace twitter threads" - @typedfemaleGraham Neubig @gneubig
31K Followers 588 Following Associate professor at CMU, studying natural language processing and machine learning.Julien Chaumond @julien_c
47K Followers 1K Following Co-founder and CTO at @huggingface 🤗. ML/AI for everyone, building products to propel communities fwd. @Stanford + @PolytechniqueSara Hooker @sarahookr
39K Followers 7K Following I lead @CohereForAI. Formerly Research @Google Brain @GoogleDeepmind. ML Efficiency at scale, LLMs, @trustworthy_ml. Changing spaces where breakthroughs happen.Thomas Wolf @Thom_Wolf
68K Followers 4K Following Co-founder and CSO @HuggingFace - open-source and open-scienceZachary Lipton @zacharylipton
59K Followers 2K Following Professor: CMU/@acmi_lab, CTO / CSO: @AbridgeHQ, Creator: @d2l_ai & https://t.co/QQt98VNLUp, Relapsing 🎷Jay Alammar @JayAlammar
35K Followers 1K Following Machine learning and language models R&D. Builder. Writer. Visualizing AI, ML, and LLMs one concept at a time. @Cohere. https://t.co/TquuQXlLOJTim Dettmers @Tim_Dettmers
29K Followers 821 Following PhD Student at @UW. I blog about deep learning and PhD life at https://t.co/Y78KDJJFE7.Jacob Andreas @jacobandreas
14K Followers 958 Following Teaching computers to read. Assoc. prof @MITEECS / @MIT_CSAIL (he/him). https://t.co/5kCnXHjtlY https://t.co/2A3qF5vdJwYoav Artzi @yoavartzi
13K Followers 163 Following Research/prof @cs_cornell + @cornell_tech🚡 / https://t.co/9YnWry7yHs / https://t.co/3VmRSyYm2d / asso. faculty director @arxiv / building https://t.co/f9QkzO5kaCshnoon lee @ShnoonL52166
4 Followers 312 Following胡马戈 @hmg1172082
0 Followers 53 FollowingDanstan Bagenda @xolaniboy
361 Followers 2K FollowingBen @Ben504417390552
0 Followers 224 FollowingBruce Long @BruceRLong
935 Followers 2K Following PhD Philosophy, MPhil English, Grad Dip psych. B App Sc Computing. https://t.co/vs4X5hepCf Victim of corruption Infopunk. Politically exposed. @brucelongsenateMotiongogogo @re_tony1
2 Followers 183 FollowingMason Wang @masonwang025
93 Followers 111 Following 18 // nlp research @stanford // ex-founder (@pearvc) // exploring and building on my gap year!Evan Abrams @EvanAbrams
2K Followers 892 Following Emerging technology and national security law @SteptoeLLP. AI | FinTech | Crypto. Not legal advice. Opinions are my own.Brianna @hii_brianna
203 Followers 395 Following Work hard, have fun, make history. #travel #party #fun #Enjoymentaipocalypse @aipokalypsis
24 Followers 218 Followingvibhav @vibhavshah
42 Followers 364 FollowingMiles Parker @botanistbyte
0 Followers 6 Followingsvraghavan @svraghavan
134 Followers 1K Following My day job is to do network programming. Love exploring new programming languages. My hobbies are reading, watching sitcoms, solving puzzles.jenny_yiong @Jennyyiong14
2 Followers 164 Following Beauty, cosmetics & personal care Fairies rely on faces to make a living Girls refuse to admit defeat👧ᵕ̈ ᑋᵉᑊ 🔸ℚ𝔹𝔼𝔼𝔽𝕃𝕐 Brand|Quick consultation🩸🛁 @Braillepro
1K Followers 4K Following Of all the bloodbaths in all the towns, in all the world, he ends up in mine. Seeking truth in the lies we tell each other on here.GaB @THFC_GaB
413 Followers 864 Following Spurs since 1972! ST holder Paxton Block 515. ST Ulster Rugby #COYS #Stones Kentish man living in Belfast. Support #autismAl Guo @AlAl40156
54 Followers 92 FollowingAnup "Noop Dawgg" Lob.. @noopaloop
514 Followers 4K Following Product + Growth + Strategy + Forever Learner #Design, #Ai, #DawggDavid Roth @DavidRoth313495
0 Followers 7 Followingmeg.ai 🇨🇦 @ #ho.. @MeganRisdal
11K Followers 1K Following Product @kaggle @google 💙 Ex @stackoverflow ML / Language / Community. Weirdness. Minnesotan in Toronto. Learning Cantonese. 我學緊廣東話.. @mysticmelt
11 Followers 20 FollowingStarryNight @staryngt
17 Followers 104 FollowingFrank @FrankG1897
0 Followers 8 Following UG student at somewhere on earth. Interested in Trading🚀 & AI(nlp)🌇Bhavya Kailkhura @bkailkhu
582 Followers 1K Following Research Scientist @Livermore_Lab. Making AI/ML Robust & Efficient for National Security and Scientific Applications.Mustafa Zaki Assagaf @mustafasegf
4K Followers 1K Following I code Rust 🦀, Nix ❄️, and Typescript 📄. PL theory, Functional programming, and system programming nerd | CSUI 19 | Indonesian 🇮🇩Mikhael Agorton @MAgorton38604
1 Followers 58 FollowingHouChen @HouChennn
0 Followers 3 FollowingAmit Singha @AmitSingha32112
0 Followers 200 FollowingLuca Bertinetto 🇪�.. @lbertinetto
860 Followers 775 Following Principal Scientist @exscientiaAI - ML for precision oncology and drug discovery; PhD @UniOfOxford. Views are not my own, it's all nature+nurture.Jonathan Whitaker @johnowhitaker
7K Followers 957 Following Data scientist and AI researcher. R&D at https://t.co/9xrxRrGfEE.Sharon Owino @5b662c44eef949c
2 Followers 23 FollowingFearlessSoulX @AgentChitsinde
231 Followers 1K Following FearlessSoul is a motivational figure who empowers individuals to conquer fear and live authentically through their inspirational content.hoaquin @hoaquin10
9 Followers 45 FollowingMSS @sajwan_mellow
14 Followers 228 Following(((ل()(ل() 'yoav))).. @yoavgo
46K Followers 2K FollowingSoumith Chintala @soumithchintala
187K Followers 883 Following Cofounded and lead @PyTorch at Meta. Also dabble in robotics at NYU. AI is delicious when it is accessible and open-source.Percy Liang @percyliang
49K Followers 408 Following Associate Professor in computer science @Stanford @StanfordHAI @StanfordCRFM @StanfordAILab @stanfordnlp | cofounder @togethercompute | PianistKyunghyun Cho @kchonyc
61K Followers 2K Following a combination of a mediocre scientist, a mediocre manager, a mediocre advisor & a mediocre PC at @nyuniversity (@CILVRatNYU) & @genentech (@PrescientDesign).clem 🤗 @ClementDelangue
91K Followers 5K Following Co-founder & CEO @HuggingFace 🤗, the open and collaborative platform for AI buildersRosanne Liu @savvyRL
33K Followers 968 Following Cofounded & running @ml_collective. Host of Deep Learning Classics & Trends. Research at Google DeepMind. DEI/DIA Chair of ICLR & NeurIPS. Writing https://t.co/IbycyGfnDRSam Bowman @sleepinyourhat
35K Followers 3K Following AI alignment + LLMs at NYU & Anthropic. Views not employers'. No relation to @s8mb. I think you should join @givingwhatwecan.Christopher Manning @chrmanning
127K Followers 116 Following Director, @StanfordAILab. Assoc. Director, @StanfordHAI. Founder, @stanfordnlp. Prof. CS & Linguistics, @Stanford. IP @aixventureshq. 🇦🇺 Do #NLProc & #AI. 👋Horace He @cHHillee
24K Followers 449 Following Working at the intersection of ML and Systems @ PyTorch "My learning style is Horace twitter threads" - @typedfemaleGraham Neubig @gneubig
31K Followers 588 Following Associate professor at CMU, studying natural language processing and machine learning.Julien Chaumond @julien_c
47K Followers 1K Following Co-founder and CTO at @huggingface 🤗. ML/AI for everyone, building products to propel communities fwd. @Stanford + @PolytechniqueJia-Bin Huang @jbhuang0604
51K Followers 285 Following Associate Professor @umdcs; Part-time Research Scientist @Meta. I like pixels.Thomas Wolf @Thom_Wolf
68K Followers 4K Following Co-founder and CSO @HuggingFace - open-source and open-scienceTim Dettmers @Tim_Dettmers
29K Followers 821 Following PhD Student at @UW. I blog about deep learning and PhD life at https://t.co/Y78KDJJFE7.Jacob Andreas @jacobandreas
14K Followers 958 Following Teaching computers to read. Assoc. prof @MITEECS / @MIT_CSAIL (he/him). https://t.co/5kCnXHjtlY https://t.co/2A3qF5vdJwYoav Artzi @yoavartzi
13K Followers 163 Following Research/prof @cs_cornell + @cornell_tech🚡 / https://t.co/9YnWry7yHs / https://t.co/3VmRSyYm2d / asso. faculty director @arxiv / building https://t.co/f9QkzO5kaCTal Linzen @tallinzen
16K Followers 894 Following Professor @nyuling and @NYUDataScience, research scientist @GoogleAIMark Riedl @mark_riedl
32K Followers 1K Following AI for storytelling, games, explainability, safety, ethics. Professor @GeorgiaTech. Associate Director @MLatGT. Time travel expert. Geek. Dad. he/himNaomi Saphra @nsaphra
7K Followers 1K Following Waiting on a robot body. ML/NLP. All opinions are universal and held by both employers and family. Same username on every lifeboat off this sinking ship.Daniel Johnson @_ddjohnson
2K Followers 576 Following Researcher at @GoogleDeepMind. PhD student at @VectorInst / @UofT. Building tools to study neural nets and find out what they know. He/him.Jason Lee @jasondeanlee
10K Followers 3K Following Associate Professor at Princeton and Research Scientist at Google DeepMind. ML/AI Researcher working on foundations of LLMs and deep learningvLLM @vllm_project
784 Followers 11 Following A high-throughput and memory-efficient inference and serving engine for LLMsGreg Leppert @leppert
2K Followers 577 Following Director at Harvard working on AI and access to knowledge. Affiliate @BKCHarvard. “Mildly humorous” —New-York Gazette. https://t.co/gMXJUgOgcTtypedfemale @typedfemale
23K Followers 478 Following a really exciting new account "have you ever though you might be like scott alexander? very smart, but can't do math" - anonSonglin Yang @SonglinYang4
2K Followers 2K Following PhD student @MIT_CSAIL. Prev. @ShanghaiTechUni @SUSTechSZ. Working on scalable and principled methods in #ML & #NLProc. INTP | 5w4 | sx/sp | she/herKweku Opoku-Agyemang,.. @KwekuOA
8K Followers 6K Following CEO @mlxdoing AI. @DevEconX The next generation. Affiliate @The_IGC. Ex-Prof @UCBerkeley, @cornell_tech. PhD @UWMadison 👉 https://t.co/ywmCg4QU5mGeorgi Gerganov @ggerganov
38K Followers 243 Following Not AI | 0x0e59 0x2550 24th at the Electrica puzzle challengeArthur Mensch @arthurmensch
40K Followers 874 Following Co-founder and CEO @MistralAI. Apply https://t.co/yHGRZAtjcxPicoCreator (🇸🇬.. @picocreator
2K Followers 164 Following Builds Attention-Free Transformer (https://t.co/YL7CbNYKBs) from scratch - CEO @ https://t.co/kQHiGtzJWr Also built k8s tools, uilicious & GPU.js (https://t.co/OIfnI1EPU7)Yuntian Deng @yuntiandeng
3K Followers 3K Following #NLProc Postdoc @ai2_mosaic | Assistant Professor @UWaterloo '24 | Faculty Affiliate @VectorInst '24 | PhD @HarvardConference on Languag.. @COLM_conf
2K Followers 6 Following https://t.co/GhGCMEoa4A Abstract submission: March 22, 2024Andrew Drozdov @mrdrozdov
2K Followers 1K Following RAG at @MosaicML x @Databricks 🧱 Prev: @UMass_NLP, @Google, @IBMNiklas Muennighoff @Muennighoff
5K Followers 323 Following @ContextualAI | Interests: AI/LLM Research & Health ❤️ | Past: @huggingface @PKU1898Yao Fu @Francis_YAO_
14K Followers 2K Following PhD @EdinburghNLP on LLMs and Machine Reasoning. Ex. @Columbia @PKU1898 @MITIBMLab @allen_ai AGI has yet to come, so keep runningCollin Burns @CollinBurns4
11K Followers 276 Following Superalignment @OpenAI. Formerly @berkeley_ai @Columbia. Former Rubik's Cube world record holder.Darek Kłeczek @dk21
3K Followers 2K Following Machine Learning, Kaggle and occasional pictures from Poland. Growth MLE at Weights & Biases.Aman Sanger @amanrsanger
15K Followers 656 Following building @cursor_ai at @anysphere https://t.co/EdcQJ2dv0J | https://t.co/vJ5zNuT6WOHendrik Strobelt @hen_str
4K Followers 462 Following Visualization and Interactive Human Centered AI. Explainability lead @MITIBMLab, @VISxAI, OE Chair @NeurIPSConf, Chair @ieeevis -- own views. #NLProc #AIJan-Willem van de Mee.. @jwvdm
2K Followers 1K Following Associate Professor (UHD) at the University of Amsterdam; Probabilistic programming and its applications.Nathan Lambert @natolambert
25K Followers 690 Following Figuring out AI @allen_ai, "rl boi" DM me papers. Writes @interconnectsai, talks @retortai Has phd and some credentialsRémi Leblond @RemiLeblond
2K Followers 155 Following Research Scientist @GoogleDeepMind. #Gemini, #AlphaCode, #AlphaStar. Working on solving hard problems with machine learning.Stability AI @StabilityAI
190K Followers 31 Following We are building the foundation to activate humanity's potential.Sergey Levine @svlevine
80K Followers 122 Following Associate Professor at UC Berkeley Co-founder, Physical IntelligenceChenlin Meng @chenlin_meng
8K Followers 833 Following Co-founder & CTO @pika_labs | ex @StanfordAILab @StanfordRafael Rafailov @rm_rafailov
3K Followers 637 Following Ph.D. Student at @StanfordAILab. I work on Foundation Models and Decision Making. Previously @GoogleDeepMind @UCBerkeleyOleksii Kuchaiev @kuchaev
482 Followers 609 Following AI model alignment and customization @NVIDIA. I love riding motorcycles and all things ocean - surfing, sailing, diving.Pika @pika_labs
116K Followers 53 Following Video on command. Website: https://t.co/G5bjmrMQsx Discord: https://t.co/bX68ThPTQH About: https://t.co/atvdcgbe9SJonathan Ho @hojonathanho
4K Followers 152 FollowingDavid Pfau @pfau
22K Followers 1K Following Knowledge manifests itself in radiant dreams that shimmer like the wild sun Views are my own pfau at sigmoid dot social on 🦣 https://t.co/xqtVHHVI17 on 🦋Wenting Zhao @wzhao_nlp
812 Followers 356 Following PhD student @cornell_tech Food for life, NLP for soul!Princeton PLI @PrincetonPLI
1K Followers 19 Following Princeton University initiative enhancing fundamental understanding of AI, enabling its use in academic disciplines, and examining AI's societal implications.Denny Zhou @denny_zhou
9K Followers 420 Following @GoogleDeepMind founder & lead of Reasoning Team. Build LLMs to reason. Opinions my own.Luca Soldaini 🎀 @soldni
6K Followers 1K Following I like tokens! Lead for OLMo data team at @allen_ai (Dolma 🍇), OSS is fun, @QueerInAI organizer 🤖☕️🍕they/them (views mine, not my employer’s)Tengyu Ma @tengyuma
26K Followers 512 Following Assistant professor at Stanford; Co-founder of Voyage AI (https://t.co/wpIITHLgF0) ; Working on ML, DL, RL, LLMs, and their theory.Sanchit Gandhi @sanchitgandhi99
4K Followers 37 Following Open-source speech @huggingface 🤗. Previously Masters' at @Cambridge_Uni.Sean Welleck @wellecks
3K Followers 223 Following Assistant Professor at CMU. Marathoner, @thesisreview.Suchin Gururangan @ssgrn
4K Followers 250 Following he/him Research scientist 🦙 Llama team, @meta GenAI PhD @uwcse + @uwnlpAnil Ananthaswamy @anilananth
8K Followers 3K Following Sci journalist/TED speaker/MIT KSJ Fellow/Books: The Edge of Physics, The Man Who Wasn't There, Through Two Doors at Once Mastodon: @[email protected]Miles Cranmer @MilesCranmer
12K Followers 903 Following Assistant Prof @Cambridge_Uni, works on AI for the physical sciences. Previously: Flatiron, DeepMind, Princeton, McGill.Quentin Lhoest @qlhoest
3K Followers 232 Following Open Source ML Engineer @huggingface | Maintainer of 🤗DatasetsPatrick Lewis @PSH_Lewis
4K Followers 656 Following London-based AI/NLP Research Scientist. I co-lead the RAG & tool use team at Cohere w/ @s_hofstaetter. Previous Fundamental AI Research at Meta AI, FAIR, UCL AIElizabeth Salesky @esalesk
1K Followers 657 Following PhD student @jhuclsp more commonly known as Liz ☀️ Friend of @NLPwithFriends ☀️ I like bubbles, bicycles, and language variationBoaz Barak @boazbaraktcs
17K Followers 422 Following Computer Scientist. See also https://t.co/EXWR5k634w, https://t.co/SEVX6it6z3 ( @[email protected] , boaz.barak in threads ). Opinions my own.Mark Yatskar @yatskar
2K Followers 474 Following Assistant Professor at UPenn @PennEngineers. NLP/CV/Fairness. Phd @UWCSE, Formerly @allen_aiHanna Hajishirzi @HannaHajishirzi
6K Followers 328 Following Associate professor at @uw_cse; senior director at @allen_ai co-leading @allenNLP; AI/NLP researcher at @uw_nlpNoah Snavely @Jimantha
7K Followers 842 Following 3D vision fanatic. Professor @cornell_tech & Researcher @GoogleAI. He or they.Saulnier Lucile @LucileSaulnier
4K Followers 432 Following AI Specialist @ Mistral AI | Former ML @ Hugging Face | ENS Paris-Saclay (MVA) | Centrale ParisUrvashi Khandelwal @ukhndlwl
2K Followers 611 Following Research Scientist @GoogleDeepMind, Stanford CS PhD @stanfordnlpllm.cpp was finally published today. It's very much CUDA C++ the good parts. Code: github.com/gevtushenko/ll… Talk: youtube.com/watch?v=WiB_3C… Speakers: x.com/g_evtushenko and Jake Hemstad
@srush_nlp E.g. principal - agent problem, where someone is hiring an agent to act on their behalf Usually firms do this, but individuals as well (e.g. many people invest passively in an index)
@srush_nlp not perfectly rational, but boundedly rational. they have some objective, some constraints (e.g., cognitive load, time, budget, risk tolerance) and they act optimally under these conditions from the outside, this can seem rational or irrational in the conventional sense
AKSelectionPredictor now runs on ZeroGPU A100, thanks to the support of @_akhaliq and @huggingface! 🔗huggingface.co/spaces/yuntian…
Will your paper catch the eye of @_akhaliq? I built a demo that predicts if AK will select a paper. It has 50% F1 using DeBERTa finetuned on data from past year. As a test, our upcoming WildChat arXiv has a 56% chance. Hopefully not a false positive🤞 🔗huggingface.co/spaces/yuntian…
@srush_nlp I would guess they mean "goal-directed" behavior as opposed to reflexive/reactive behavior. Of course, often these cannot be distinguished at the functional level.
@srush_nlp worth a read. x.com/jessicadai_/st…
lots of talk about “ethical AI”, not so much about what it actually means to be an ethical agent... long story short, if what we care about is “bad stuff that happens due to AI,” I don’t think (ethical) agency is a particularly useful or even logically sound starting point!!!
*Alice goes to a differentiable wonderland!* 🔥 I published a short free book on the design of neural networks, from convolutions to transformers, SSMs, and a few other topics. As a bonus, I tried to make it looking nice - any feedback is appreciated! sscardapane.it/alice-book
@srush_nlp I feel this is really captured by @tanyaagoyal and colleagues: arxiv.org/abs/2209.12356 If you start looking at the *human* preferences of summaries, things start looking different very quickly. I'd expect the same tendency holds for GPT-2 (minus instruction tuning)?
in the coming weeks me and @ZhengxuanZenWu are giving in-person talks on ReFT at - @Demandbase (SF, 5/1) - @stanfordnlp Lunch (Stanford, 5/2) - @awscloud Generative AI (Santa Clara, 5/10)
Happy to release Accelerated Scan, a kernel library for first order parallel associative scans in vanilla @PyTorch, Triton 2.2.0 and CUDA C++. pip install accelerated-scan🧵
@aryaman2020 Let's do it $100 Phi 3 Medium beats LLAMA 3 8B instruct on Arena 🤝
Do models need to reason in words to benefit from chain-of-thought tokens? In our experiments, the answer is no! Models can perform on par with CoT using repeated '...' filler tokens. This raises alignment concerns: Using filler, LMs can do hidden reasoning not visible in CoT🧵
@srush_nlp Hey Sasha, I think it makes sense. Phi-3 is fundamentally different from other models, so its behavior can be unexpected in some cases, both in a good and bad way (hopefully though much more in a good way ;-)).
@akyurekekin @lambdaviking @srush_nlp Re A fuzziness: something else that seems important here is that it’s easy to implement something like classical smoothing / backoff with softmax attn, but not (I think) linear attn
@akyurekekin @lambdaviking @srush_nlp My guess is that this explains the separation we see between standard tfs and evening else in the ICLL paper, and presumably which of @lambadaviking’s various IH implementations you see in different model architectures
Never think about x ↦ x - η∇L(x) (gradient descent), even as a simplification. Replace it with x ↦ (1-𝛾)x + 𝛾 argmin_{y∈X} ⟨y,∇L(x)⟩ (Frank-Wolfe; a Mann iteration) or x ↦ (1-λ)x + η argmin_{||𝚫||≤1} ⟨𝚫,∇L(x)⟩ (normalized steepest descent)
The dimensional analysis of gradient descent is odd; the unit of the gradient is "loss / weight" and it gets multiplied by the learning rate to get a delta with "weight" units, so the learning rate has unit "weight^2 / loss".
@srush_nlp No benchmarks, only vibes are the way forward
@srush_nlp I tend to trust @abacaj's vibe checks, seems to be good but might need some good prompting techniques/playing with it to get it to work best? x.com/abacaj/status/…
I’m finding many cases where phi-2 was getting confused/hallucinating not happening with phi-3
@srush_nlp @akyurekekin Tangent: there may also be a way to connect the definition to skip bigram matching, which is relevant in the study of subregular formal languages and first-order definable languages cf Example 2: arxiv.org/pdf/2210.02671