IT WORKS! Running Mixtral 8x22B with Transformers! 🔥 Running on a DGX (4x A100 - 80GB) with CPU offloading 🤯
IT WORKS! Running Mixtral 8x22B with Transformers! 🔥 Running on a DGX (4x A100 - 80GB) with CPU offloading 🤯 https://t.co/4gQuvwnHbM
In case someone is interested in the code, then here you go: from transformers import AutoModelForCausalLM, AutoTokenizer model_id = "mistral-community/Mixtral-8x22B-v0.1" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16, device_map="auto") text = "The meaning of life, universe and everything is " inputs = tokenizer(text, return_tensors="pt").to("cuda") outputs = model.generate(**inputs, max_new_tokens=20) print(tokenizer.decode(outputs[0], skip_special_tokens=True))
@reach_vb nice! so how do I get access to hf-dgx-01? 😎
@reach_vb So you need CPU offloading AND 4x A100 80GB? wow.. (FP16 right? we cant see the end of the line)
@reach_vb How's the GPU memory usage? did it eat 80% all of em?
@reach_vb didn't we all greet that mistral sucks already? let it be.