Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
In a Training Loop 🔄
98.0
TFLOPS
72
124
264
Asankhaya Sharma
codelion
Follow
hllj's profile picture
wobondar's profile picture
Jessicca11's profile picture
387 followers
·
21 following
http://asankhaya.github.io/
asankhaya
codelion
asankhaya
AI & ML interests
Creator of OptiLLM, OpenEvolve, Adaptive Classifier, and Ellora. Pioneering a new category in AI infrastructure: inference-time compute for LLMs.
Recent Activity
reacted
to
their
post
with ➕
about 7 hours ago
Introducing Dhara-70M: A diffusion language model that achieves 3.8x higher throughput than autoregressive models! Key findings from our research on optimal architectures for small language models: → Depth beats width: 32 layers outperforms 12 layers at the same parameter count → Best-in-class factuality: 47.5% on TruthfulQA → 10x training efficiency using WSD (Warmup-Stable-Decay) conversion → Canon layers add only 0.13% parameters but improve reasoning We trained on 1B tokens using the optimal 50-30-20 dataset mix (PDFs + filtered web + educational content), then converted to diffusion with just 100M additional tokens. Blog: https://huggingface.co/blog/codelion/optimal-model-architecture Model: https://huggingface.co/codelion/dhara-70m
reacted
to
their
post
with 🤗
about 7 hours ago
Introducing Dhara-70M: A diffusion language model that achieves 3.8x higher throughput than autoregressive models! Key findings from our research on optimal architectures for small language models: → Depth beats width: 32 layers outperforms 12 layers at the same parameter count → Best-in-class factuality: 47.5% on TruthfulQA → 10x training efficiency using WSD (Warmup-Stable-Decay) conversion → Canon layers add only 0.13% parameters but improve reasoning We trained on 1B tokens using the optimal 50-30-20 dataset mix (PDFs + filtered web + educational content), then converted to diffusion with just 100M additional tokens. Blog: https://huggingface.co/blog/codelion/optimal-model-architecture Model: https://huggingface.co/codelion/dhara-70m
reacted
to
their
post
with 🚀
about 7 hours ago
Introducing Dhara-70M: A diffusion language model that achieves 3.8x higher throughput than autoregressive models! Key findings from our research on optimal architectures for small language models: → Depth beats width: 32 layers outperforms 12 layers at the same parameter count → Best-in-class factuality: 47.5% on TruthfulQA → 10x training efficiency using WSD (Warmup-Stable-Decay) conversion → Canon layers add only 0.13% parameters but improve reasoning We trained on 1B tokens using the optimal 50-30-20 dataset mix (PDFs + filtered web + educational content), then converted to diffusion with just 100M additional tokens. Blog: https://huggingface.co/blog/codelion/optimal-model-architecture Model: https://huggingface.co/codelion/dhara-70m
View all activity
Organizations
codelion
's models
29
Sort: Recently updated
codelion/Qwen3-4B-Instruct-2507-self-verify-lora
Updated
about 8 hours ago
•
21
codelion/dhara-70m
Text Generation
•
71.3M
•
Updated
1 day ago
•
880
•
5
codelion/gpt-2-70m
Text Generation
•
64.1M
•
Updated
Nov 2
•
581
•
18
codelion/Qwen3-4B-execution-world-model-lora
Text Generation
•
Updated
Oct 20
•
34
•
3
codelion/Qwen2.5-Coder-0.5B-Instruct-security-grpo-lora
Text Generation
•
Updated
Aug 2
•
5
codelion/qwen2-5-coder-0-5b-instruct-progressive-2000k-lora
Text Generation
•
Updated
Jul 20
•
4
codelion/Llama-3.2-1B-Instruct-tool-calling-lora
Text Generation
•
Updated
Jul 18
•
72
•
4
codelion/gemma-3-1b-it-reasoning-grpo-lora
Text Generation
•
Updated
Jul 18
•
15
•
5
codelion/Qwen3-0.6B-ICM-DPO
Text Generation
•
0.6B
•
Updated
Jul 18
•
11
codelion/gemma-3-1b-it-ICM-DPO
Text Generation
•
1.0B
•
Updated
Jul 18
•
13
codelion/gemma-3-1b-it-ICM-DPO-mlx-fp16
Text Generation
•
1B
•
Updated
Jul 17
•
21
codelion/Qwen3-0.6B-ICM-DPO-mlx-fp16
Text Generation
•
0.6B
•
Updated
Jul 17
•
23
•
2
codelion/Qwen3-0.6B-accuracy-recovery-lora
Text Generation
•
Updated
Jul 13
•
66
•
4
codelion/Qwen3-0.6B-GRPO-mlx-fp16
Text Generation
•
0.6B
•
Updated
Jul 11
•
7
codelion/Qwen3-0.6B-GRPO
Text Generation
•
0.6B
•
Updated
Jul 11
•
5
codelion/DeepSeek-R1-Distill-Qwen-1.5B-PTS-DPO
Text Generation
•
2B
•
Updated
May 13
•
11
•
2
codelion/Qwen3-0.6B-PTS-DPO
Text Generation
•
0.6B
•
Updated
May 12
•
17
•
1
codelion/Qwen3-0.6B-PTS-DPO-LoRA
Updated
May 7
•
1
codelion/optillm-bert-uncased
Updated
Feb 16
•
56
•
5
codelion/optillm-modernbert-large
Updated
Feb 16
•
30
•
9
codelion/Llama-3.3-70B-o1
Text Generation
•
71B
•
Updated
Jan 21
•
91
•
•
2
codelion/Llama-3.3-70B-o1-gguf
71B
•
Updated
Jan 20
•
103
•
1
codelion/Llama-3.3-70B-o1-lora
Updated
Jan 20
•
2
codelion/Llama-3.2-3B-o1
3B
•
Updated
Jan 12
•
72
•
5
codelion/Llama-3.2-3B-o1-lora
Updated
Jan 12
•
4
codelion/MathCoT
8B
•
Updated
Nov 26, 2024
•
33
•
2
codelion/scorelora
Updated
Oct 15, 2024
•
6
•
3
codelion/public-domain-mickey-mouse
Text-to-Image
•
Updated
Jan 5, 2024
•
13
•
•
2
codelion/whisper-age-estimator
Automatic Speech Recognition
•
72.6M
•
Updated
Sep 10, 2023
•
64
•
3