Hiranmay Darshane @hdarshane

research intern https://t.co/Uyagq8Ilgm. deep learning and large language models. football (banter) fan. 18. hiranmay.com Mumbai, India. Joined October 2019

Tweets

5K
Followers

672
Following

1K
Likes

32K

Unnat Jain @unnatjain2010

3 days ago

Alyosha has a humbling lesson that hits us hard 🫣 @ Four seasons ballroom 4, CVPR 2026

0 6 66 41K 18

0 0 1 114 0

View Details

7/ Looking beyond this paper: scaling compute against a fixed, limited pool of data will need new primitives. Searching over a population of models is a different problem than standard gradient descent training and we've barely scratched the surface. We hope q0 pushes people toward crazy ideas in multi-epoch training and scaling compute in general!!

2 5 20 879 2

View Details

Hiranmay Darshane @hdarshane

2 days ago

yay

Samip @industriaalist

2 days ago

1/ Now that we're running out of data, how do you optimally scale multi-epoch pretraining to hundreds of epochs? Our first paper from Q! q0 trains a population of models, instead of single model that saturates fast, reaching a dramatically lower loss at *every* epoch budget. w/

15 55 259 27K 227

0 0 6 631 0

View Details

stochasm @stochasticchasm

4 days ago

@soldni regularization is BACK i suppose. dropout 0.15 is quite large and i don't think anyone else uses dropout in the big 26. also rather high std for init these days but you can't go wrong with a good old 0.02. also why depth scale output proj when you have sandwich norm??

4 3 67 14K 14

View Details

Hiranmay Darshane @hdarshane

4 days ago

@teortaxesTex @jacobrintamaki

0 0 1 220 0

View Details

Hiranmay Darshane @hdarshane

4 days ago

🔥

Jasper Gilley @0xjasper

5 days ago

This paper empirically ~verifies the section of my first Zipfian grokking blog post where I hypothesize about how capacity competition dynamics extrapolate from the grokking to language pretraining case Cool work from the authors! :)

1 1 13 3K 16

0 0 2 122 0

View Details

kalomaze @kalomaze

4 days ago

q: "why don't Sora-like models learn compositional physics understanding or do ICL like how language models learn compositional semantics?" a: every attempt to date heavily leaks information from the future. some even bake it into the bottleneck design without realizing (!!!)

5 8 99 5K 50

View Details

Hiranmay Darshane @hdarshane

4 days ago

^I mean why is it not

0 0 0 127 0

View Details

Hiranmay Darshane @hdarshane

4 days ago

is this not regulated by SEC?

Hedgeye @Hedgeye

a week ago

Rule changes for the SpaceX $SPCX IPO: Index providers waived the profitability requirement and cut the seasoning window from 90 days to 5. This forces over $30 trillion in passive 401k and retirement money to buy SpaceX at IPO valuations. Bloomberg Intelligence estimates S&P

550 2K 10K 11.6M 4K

1 0 1 253 0

View Details

Christopher Potts @ChrisGPotts

5 days ago

The following animation convey the intuition: when a 1-neuron model tries to learn two tasks, the frequent task updates suppress the infrequent task updates. The 2-neuron model can dedicate a neuron to the infrequent task once the frequent one is fully learned.

2 5 82 4K 19

View Details

Hiranmay Darshane @hdarshane

5 days ago

a quick way to force oneself into thinking about a thing is maintaining a list of words about that thing and just staring at it something something required circuits activate from high cosine similarity

1 1 6 254 0

View Details

Max Weinbach @mweinbach

6 days ago

I was thinking about it again recently, Google Allo was really ahead on the idea of chatting with Google Assistant or @'ing in conversations to build out this Agent/AI UX we have now

26 19 506 32K 56

View Details

Hiranmay Darshane @hdarshane

a week ago

most things arrive unrecognizable to the ideas that summoned them

0 0 2 70 0

View Details

Rohan Pandey @khoomeik

a week ago

my favorite interp researcher can identify neurons responsible for any behavior and provide steering vectors for them her name is backprop and her steering vectors are just gradients

8 8 280 24K 55

View Details

Geoffrey Litt @geoffreylitt

a week ago

@mschoening and I are starting a podcast where we nerd out about human-AI collaboration and malleable software. In this episode: is HTML actually better than Markdown? and an alternative to Software Factories... Watch on YT: youtu.be/KB9lRdM5eO0?si…

15 17 170 33K 90

View Details

Hiranmay Darshane @hdarshane

a week ago

@leothecurious Olah

0 0 5 307 0

View Details

Hiranmay Darshane @hdarshane

a week ago

does seem like all that time with colah did not alter his worldview at all

Pope Leo XIV @Pontifex

a week ago

Artificial intelligences do not undergo experiences, do not possess a body, do not feel joy or pain, do not mature through relationships, and do not know from within what love, work, friendship or responsibility mean. Nor do they have a moral conscience, since they do not judge

4K 60K 310K 14.1M 26K

0 0 3 117 0

View Details

Noah Smith 🐇🇺🇸🇺🇦🇹🇼 @Noahpinion

a week ago

Yes, AIs are going to do all or almost all of the pure theory, but tbh humans probably finished most of the pure theory that it's possible for humans to do by the end of the 20th century. Yes there has been some recent theory progress but let's be honest, most is of marginal economic value at best. There's probably lots of useful pure theory left to do in this universe, but it's probably not the kind of stuff that can be intuited by a single human, explained to a grad student, and written down in a textbook. AI will do all that stuff.

35 15 222 73K 58

View Details

jss @jsensarma

a week ago

1. people undestimate how hard this problem is 2. universal issue. IGCSE billed ~Rs 40k for exams - still many papers leaked 3. change is much harder than running things as is. migration to OSM requires competence++++ 4. with privatization, public sector => competence----

Ceteris Paribus @entropied2223

a week ago

Why have routine tasks like holding exams become so difficult suddenly? What is the source of this new found incompetence?

14 19 230 18K 24

7 3 24 5K 5

View Details

Akshay @akshayvegesna

a week ago

Cool presenting on why generalization in neural nets is less of a mystery than many make it out to be:

Y Combinator @ycombinator

a week ago

Last week we hosted the first ever YC Paper Club in Mountain View. We brought together great AI researchers and founders to discuss both the state of the art and what it actually takes to get it into production. Thanks to the following presenters: 0:12 - Intro from YC Visiting