deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B Text Generation • Updated about 1 month ago • 1.81M • • 1.09k
SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training Paper • 2501.17161 • Published Jan 28 • 116
The GAN is dead; long live the GAN! A Modern GAN Baseline Paper • 2501.05441 • Published Jan 9 • 91 • 5
Llama3-8B-1.58 Collection A trio of powerful models: fine-tuned from Llama3-8b-Instruct, with BitNet architecture! • 3 items • Updated Sep 14, 2024 • 11
google/siglip-so400m-patch14-384 Zero-Shot Image Classification • Updated Sep 26, 2024 • 7.76M • • 505