Good folks at Meta has just unveiled Llama 3.2, pushing the boundaries of language models and computer vision.
Even more interesting is how they trained this cutting-edge model:
1️⃣ Architecture: Llama 3.2 uses an optimized transformer architecture with auto-regressive capabilities. The largest models (11B and 90B) now support multimodal inputs, integrating both text and images.
2️⃣ Training Pipeline: • Started with pretrained Llama 3.1 text models • Added image adapters and encoders • Pretrained on large-scale noisy (image, text) pair data • Fine-tuned on high-quality in-domain and knowledge-enhanced (image, text) pairs
3️⃣ Vision Integration: • Trained adapter weights to integrate a pre-trained image encoder • Used cross-attention layers to feed image representations into the language model • Preserved text-only capabilities by not updating language model parameters during adapter training
4️⃣ Post-Training Alignment: • Multiple rounds of supervised fine-tuning (SFT) • Rejection sampling (RS) • Direct preference optimization (DPO) • Synthetic data generation using Llama 3.1 for Q&A augmentation • Reward model ranking for high-quality fine-tuning data
5️⃣ Lightweight Models: • Used pruning and distillation techniques for 1B and 3B models • Structured pruning from Llama 3.1 8B model • Knowledge distillation using Llama 3.1 8B and 70B as teachers
6️⃣ Context Length: All models support an impressive 128K token context length.
7️⃣ Safety Measures: Incorporated safety mitigation data to balance helpfulness and safety.
The result? A suite of models ranging from edge-friendly 1B parameters to powerful 90B parameter versions, capable of sophisticated reasoning across text and images. Llama 3.2 is set to revolutionize AI applications from mobile devices to enterprise-scale solutions.
What are your thoughts on these advancements? How do you see Llama 3.2 impacting your industry? Let's discuss in the comments!
We introduce and opensource WizardLM-2, our next generation state-of-the-art large language models, which have improved performance on complex chat, multilingual, reasoning and agent. New family includes three cutting-edge models: WizardLM-2 8x22B, WizardLM-2 70B, and WizardLM-2 7B.
WizardLM-2 8x22B is our most advanced model, and the best opensource LLM in our internal evaluation on highly complex tasks. WizardLM-2 70B reaches top-tier reasoning capabilities and is the first choice in the same size. WizardLM-2 7B is the fastest and achieves comparable performance with existing 10x larger opensource leading models.
🤗 WizardLM 2 Capacities:
1. MT-Bench (Figure-1) The WizardLM-2 8x22B even demonstrates highly competitive performance compared to the most advanced proprietary works such as GPT-4-Trubo and Glaude-3. Meanwhile, WizardLM-2 7B and WizardLM-2 70B are all the top-performing models among the other leading baselines at 7B to 70B model scales.
2. Human Preferences Evaluation (Figure 2) Through this human preferences evaluation, WizardLM-2's capabilities are very close to the cutting-edge proprietary models such as GPT-4-1106-preview, and significantly ahead of all the other open source models.
🔍Method Overview: As the natural world's human-generated data becomes increasingly exhausted through LLM training, we believe that: the data carefully created by AI and the model step-by-step supervised by AI will be the sole path towards more powerful AI.
In the past one year, we built a fully AI powered synthetic training system. (As shown in the Figure 3).