Shane Tian

ShaneTian
·

AI & ML interests

None yet

Recent Activity

Organizations

None yet

ShaneTian's activity

New activity in infly/OpenCoder-8B-Base 3 months ago
reacted to singhsidhukuldeep's post with 👍 4 months ago
view post
Post
2595
Good folks at Meta has just unveiled Llama 3.2, pushing the boundaries of language models and computer vision.

Even more interesting is how they trained this cutting-edge model:

1️⃣ Architecture:
Llama 3.2 uses an optimized transformer architecture with auto-regressive capabilities. The largest models (11B and 90B) now support multimodal inputs, integrating both text and images.

2️⃣ Training Pipeline:
• Started with pretrained Llama 3.1 text models
• Added image adapters and encoders
• Pretrained on large-scale noisy (image, text) pair data
• Fine-tuned on high-quality in-domain and knowledge-enhanced (image, text) pairs

3️⃣ Vision Integration:
• Trained adapter weights to integrate a pre-trained image encoder
• Used cross-attention layers to feed image representations into the language model
• Preserved text-only capabilities by not updating language model parameters during adapter training

4️⃣ Post-Training Alignment:
• Multiple rounds of supervised fine-tuning (SFT)
• Rejection sampling (RS)
• Direct preference optimization (DPO)
• Synthetic data generation using Llama 3.1 for Q&A augmentation
• Reward model ranking for high-quality fine-tuning data

5️⃣ Lightweight Models:
• Used pruning and distillation techniques for 1B and 3B models
• Structured pruning from Llama 3.1 8B model
• Knowledge distillation using Llama 3.1 8B and 70B as teachers

6️⃣ Context Length:
All models support an impressive 128K token context length.

7️⃣ Safety Measures:
Incorporated safety mitigation data to balance helpfulness and safety.

The result? A suite of models ranging from edge-friendly 1B parameters to powerful 90B parameter versions, capable of sophisticated reasoning across text and images. Llama 3.2 is set to revolutionize AI applications from mobile devices to enterprise-scale solutions.

What are your thoughts on these advancements? How do you see Llama 3.2 impacting your industry? Let's discuss in the comments!
reacted to WizardLM's post with 🚀 10 months ago
view post
Post
40422
🔥🔥🔥 Introducing WizardLM-2!

📙Release Blog: https://wizardlm.github.io/WizardLM2
✅Model Weights: microsoft/wizardlm-661d403f71e6c8257dbd598a
🐦Twitter: https://twitter.com/WizardLM_AI/status/1779899325868589372

We introduce and opensource WizardLM-2, our next generation state-of-the-art large language models, which have improved performance on complex chat, multilingual, reasoning and agent. New family includes three cutting-edge models: WizardLM-2 8x22B, WizardLM-2 70B, and WizardLM-2 7B.

WizardLM-2 8x22B is our most advanced model, and the best opensource LLM in our internal evaluation on highly complex tasks. WizardLM-2 70B reaches top-tier reasoning capabilities and is the first choice in the same size. WizardLM-2 7B is the fastest and achieves comparable performance with existing 10x larger opensource leading models.

🤗 WizardLM 2 Capacities:

1. MT-Bench (Figure-1)
The WizardLM-2 8x22B even demonstrates highly competitive performance compared to the most advanced proprietary works such as GPT-4-Trubo and Glaude-3. Meanwhile, WizardLM-2 7B and WizardLM-2 70B are all the top-performing models among the other leading baselines at 7B to 70B model scales.

2. Human Preferences Evaluation (Figure 2)
Through this human preferences evaluation, WizardLM-2's capabilities are very close to the cutting-edge proprietary models such as GPT-4-1106-preview, and significantly ahead of all the other open source models.

🔍Method Overview:
As the natural world's human-generated data becomes increasingly exhausted through LLM training, we believe that: the data carefully created by AI and the model step-by-step supervised by AI will be the sole path towards more powerful AI.

In the past one year, we built a fully AI powered synthetic training system. (As shown in the Figure 3).
·
New activity in bigcode/starcoder2-15b 11 months ago

Optimization details

2
#16 opened 11 months ago by
ShaneTian
New activity in bigcode/the-stack-v2 11 months ago
New activity in bigcode/starcoder2-15b 11 months ago

Training loss or logs?

#15 opened 11 months ago by
ShaneTian
New activity in deepseek-ai/deepseek-llm-67b-chat about 1 year ago

ctx window & languages?

4
#1 opened about 1 year ago by
JosephusCheung