🤩 𝐌𝐢𝐧𝐢-𝐆𝐞𝐦𝐢𝐧𝐢 : A new framework that can enhance Vision Language Models to bridge gap between OS VLMs and models like GPT4🌟 🧩Improves potential of VLMs for better perf & any-to-any workflow: Image Understanding, Reasoning, and Generation! 💪 Demo, Models, Data!🧶👇
🔧Proposes dual vision encoders for high-res refinement without increasing visual tokens 🔧High-quality datasets on 🤗Hub 🔧VLM-guided image understanding & generation 🚀Build multi-modal apps with Gradio & Mini-Gemini! Start here: Gradio.dev
💪 Mini-Gemini supports dense and MoE Large Language Models (LLMs) from 2B to 34B. 📊 Achieves leading performance in zero-shot benchmarks, surpassing developed private models. 🌟 Play with Mini-Gemini Official Gradio demo on @huggingface Spaces! Link in below tweets.
@huggingface Mini-Gemini📣 😍2B-34B Models on Hub, : huggingface.co/collections/Ya… 💽Datasets for precise Image Comprehension & Reasoning-based Generation : MGM-Instruction- huggingface.co/datasets/Yanwe… MGM-Pretrain- huggingface.co/datasets/Yanwe… 🔗Demo: huggingface.co/spaces/wcy1122…