Rohan Pandey (e/acc) @khoomeik, Twitter Profile

Rohan Pandey (e/acc) @khoomeik

a month ago

Say your training compute budget = ~1.5e13 FLOPs If your dataset has a gzip compressibility ratio of 0.14, you should *max out your param count* and skimp on dataset size But if your dataset is less compressible (gzip=0.61), *keep your model small* and train it on a ton of data

Rohan Pandey (e/acc) @khoomeik

a month ago

7 35 257 39K 144

Download Image

10 18 160 23K 98

Download Image

Rohan Pandey (e/acc) @khoomeik

a month ago

Here's a comparable (but data compressibility agnostic) visual from Chinchilla, I'm just not goated enough at matplotlib to plot those IsoLoss contours. I also haven't yet fit the actual scaling laws. Just been exploring visualizations + intuition last couple days lol sorry.

1 0 8 809 0

Download Image

Telt 🍕 @twofifteenam

a month ago

@khoomeik isn't the first case basically memorization?

1 0 0 138 1

Henry Mao @Calclavia

a month ago

@khoomeik Can you elaborate? Intuitively, it feels like if your data is less compressible / harder to compress, you would need more parameters / model capacity to absorb the info entropy within the dataset. Why is the opposite true?

3 0 5 498 1

Tristan Thrush @TristanThrush

a month ago

@khoomeik This is a really cool direction - I'm glad I get to say that I knew you before you became one of the godfathers of scaling :D

1 0 7 426 0

George @georgejrjrjr

a month ago

@khoomeik This is *incredibly* cool; I'm on the edge of my fucking seat. Also, love the apparent meta here: you can't really get scooped when you post your findings in real time!

1 0 2 150 0

Jared Hulbert @jaredhulbert

a month ago

@khoomeik That actually makes sense. I love this work.

0 0 1 184 0