Build your MoE with LLaMa3 models - A very nice tool `mergoo` You can easily build your own MoE-style architecture with LLaMa3 fine-tuned models on Huggingface hub 🤗 * Supports several merging methods: Mixture-of-Experts, Mixture-of-Adapters, and Layer-wise merging * Flexible merging for each layer * Base Models supported : Llama(including LLaMa3), Mistral, and BERT * Trainers supported : 🤗 Trainer, SFTrainer, PEFT * Device Supported: CPU, MPS, GPU * Training choices: Only Router of MoE layers, or Fully fine-tuning of Merged LLM ------ Specify the config for merging: - ```model_type```: type of base model. choices: ```mistral```, ```llama```, or ```bert```. - ```num_experts_per_token```: Number of experts for each token of MoE. - ```experts```: config for experts to merge. includes ```expert_name``` and Hugging Face 🤗```model_id```. - ```router_layers```: layers chosen for applying Mixture-of-Experts.
@rohanpaul_ai Wait so is it making a 3 expert MoE here? I'm a bit confused as to why it would need to train the gates if its only 2 experts ( expert1 expert2)
@rohanpaul_ai Thanks @rohanpaul_ai for sharing!