michaelbenayoun/granite-tiny-4kv-heads-4layers-random Text Generation • 0.0B • Updated 18 days ago • 6.09k
michaelbenayoun/granite-tiny-4kv-heads-4layers-random Text Generation • 0.0B • Updated 18 days ago • 6.09k
michaelbenayoun/qwen3-tiny-4kv-heads-4layers-random Text Generation • 0.0B • Updated 18 days ago • 10.3k
michaelbenayoun/qwen3-tiny-4kv-heads-8layers-random Text Generation • 0.0B • Updated 18 days ago • 6
michaelbenayoun/qwen3-tiny-4kv-heads-8layers-random Text Generation • 0.0B • Updated 18 days ago • 6
michaelbenayoun/qwen3-tiny-4kv-heads-4layers-random Text Generation • 0.0B • Updated 18 days ago • 10.3k
michaelbenayoun/llama-2-tiny-4kv-heads-4layers-random Text Generation • 0.0B • Updated Jun 2 • 55.2k
michaelbenayoun/llama-2-tiny-4kv-heads-16layers-random Text Generation • 0.0B • Updated May 27 • 1.73k
Running 2.75k 2.75k The Ultra-Scale Playbook 🌌 The ultimate guide to training LLM on large GPU Clusters