HFarias's picture
7

HFarias

Hecdin
Ā·

AI & ML interests

None yet

Recent Activity

updated a collection about 2 months ago
Favorites šŸ’•
liked a Space about 2 months ago
lmarena-ai/chatbot-arena-leaderboard
liked a Space 4 months ago
gabrielchua/open-notebooklm
View all activity

Organizations

None yet

Hecdin's activity

reacted to singhsidhukuldeep's post with šŸ‘ 4 months ago
view post
Post
2595
Good folks at Meta has just unveiled Llama 3.2, pushing the boundaries of language models and computer vision.

Even more interesting is how they trained this cutting-edge model:

1ļøāƒ£ Architecture:
Llama 3.2 uses an optimized transformer architecture with auto-regressive capabilities. The largest models (11B and 90B) now support multimodal inputs, integrating both text and images.

2ļøāƒ£ Training Pipeline:
ā€¢ Started with pretrained Llama 3.1 text models
ā€¢ Added image adapters and encoders
ā€¢ Pretrained on large-scale noisy (image, text) pair data
ā€¢ Fine-tuned on high-quality in-domain and knowledge-enhanced (image, text) pairs

3ļøāƒ£ Vision Integration:
ā€¢ Trained adapter weights to integrate a pre-trained image encoder
ā€¢ Used cross-attention layers to feed image representations into the language model
ā€¢ Preserved text-only capabilities by not updating language model parameters during adapter training

4ļøāƒ£ Post-Training Alignment:
ā€¢ Multiple rounds of supervised fine-tuning (SFT)
ā€¢ Rejection sampling (RS)
ā€¢ Direct preference optimization (DPO)
ā€¢ Synthetic data generation using Llama 3.1 for Q&A augmentation
ā€¢ Reward model ranking for high-quality fine-tuning data

5ļøāƒ£ Lightweight Models:
ā€¢ Used pruning and distillation techniques for 1B and 3B models
ā€¢ Structured pruning from Llama 3.1 8B model
ā€¢ Knowledge distillation using Llama 3.1 8B and 70B as teachers

6ļøāƒ£ Context Length:
All models support an impressive 128K token context length.

7ļøāƒ£ Safety Measures:
Incorporated safety mitigation data to balance helpfulness and safety.

The result? A suite of models ranging from edge-friendly 1B parameters to powerful 90B parameter versions, capable of sophisticated reasoning across text and images. Llama 3.2 is set to revolutionize AI applications from mobile devices to enterprise-scale solutions.

What are your thoughts on these advancements? How do you see Llama 3.2 impacting your industry? Let's discuss in the comments!