Samuel L Smith @SamuelMLSmith, Twitter Profile

Samuel L Smith @SamuelMLSmith

3 weeks ago

Announcing RecurrentGemma! github.com/google-deepmin… - A 2B model with open weights based on Griffin - Replaces transformer with mix of gated linear recurrences and local attention - Competitive with Gemma-2B on downstream evals - Higher throughput when sampling long sequences

9 69 280 173K 139

Download Image

Samuel L Smith @SamuelMLSmith

3 weeks ago

Building on ideas from SSMs and LSTMs, Griffin matches transformer performance without global attention, achieving faster inference on long sequences. arxiv.org/abs/2402.19427 See @sohamde_'s great thread for more details: x.com/sohamde_/statu…

Soham De @sohamde_

2 months ago

12 62 318 45K 199

1 2 30 3K 5

Zhengyao Jiang @zhengyaojiang

2 weeks ago

@SamuelMLSmith Exciting work! Curious why you didn't train a larger version. Was it due to computational budget, or are there scalability limits with the training process or model performance?

1 0 0 157 0

BaohaoLiao @baohao_liao

3 weeks ago

@SamuelMLSmith “Competitive performance to Gemma-2B” on both short and long context?

1 0 0 614 1

Sasha Rush @srush_nlp

3 weeks ago

@SamuelMLSmith Hi Samuel. Curious why the RNN width does not get bigger than the model width? is that to save decoding memory? That should have little training cost right?

2 1 3 2K 3

Download Image