Rohan Paul @rohanpaul_ai, Twitter Profile

Rohan Paul @rohanpaul_ai

2 weeks ago

Build your MoE with LLaMa3 models - A very nice tool `mergoo` You can easily build your own MoE-style architecture with LLaMa3 fine-tuned models on Huggingface hub 🤗 * Supports several merging methods: Mixture-of-Experts, Mixture-of-Adapters, and Layer-wise merging * Flexible merging for each layer * Base Models supported : Llama(including LLaMa3), Mistral, and BERT * Trainers supported : 🤗 Trainer, SFTrainer, PEFT * Device Supported: CPU, MPS, GPU * Training choices: Only Router of MoE layers, or Fully fine-tuning of Merged LLM ------ Specify the config for merging: - ```model_type```: type of base model. choices: ```mistral```, ```llama```, or ```bert```. - ```num_experts_per_token```: Number of experts for each token of MoE. - ```experts```: config for experts to merge. includes ```expert_name``` and Hugging Face 🤗```model_id```. - ```router_layers```: layers chosen for applying Mixture-of-Experts.

3 37 186 13K 153

Download Image

Rohan Paul @rohanpaul_ai

2 weeks ago

github.com/Leeroo-AI/merg…

0 0 4 611 1

nisten @nisten

2 weeks ago

@rohanpaul_ai Wait so is it making a 3 expert MoE here? I'm a bit confused as to why it would need to train the gates if its only 2 experts ( expert1 expert2)

3 0 0 451 0

Alireza Mohammadshahi @alireza_mshi

2 weeks ago

@rohanpaul_ai Thanks @rohanpaul_ai for sharing!

0 0 1 247 0