Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
191
7
24
Robert Sinclair
ZeroWw
Follow
MrDevolver's profile picture
MisterAI's profile picture
ruchir28's profile picture
103 followers
·
18 following
https://huggingface.co/RobertSinclair
0wwafa
0wwafa
AI & ML interests
LLMs optimization (model quantization and back-end optimizations) so that LLMs can run on computers of people with both kidneys. Discord: https://discord.com/channels/@robert_46007
Recent Activity
updated
a model
10 days ago
ZeroWw/Art-0-8B-GGUF
new
activity
10 days ago
xai-org/grok-2:
In the face of google who didn't release old gemini! (Thanks)
reacted
to
codelion
's
post
with 👀
10 days ago
I wanted to share a technique that's been working really well for recovering performance after INT4 quantization. Typically, quantizing the LLM to INT4 (unlike say INT8) for inference can incur some accuracy loss. Instead of accepting the quality loss, we used the FP16 model as a teacher to train a tiny LoRA adapter (rank=16) for the quantized model. The cool part: the model generates its own training data using the Magpie technique so no external datasets needed. This is critical because we want to remain as much as possible in the distribution of the model's natural responses. Last year Apple's foundational models paper (https://arxiv.org/pdf/2407.21075) had proposed a similar technique and found "By using accuracy-recovery LoRA adapters with only rank 16, Alpaca win rate can be improved by 7-18%, GMS8K accuracy is boosted by 5-10%." (page 47). We saw similar results on Qwen3-0.6B: Perplexity: 2.40 → 2.09 (only 5.7% degradation from FP16 baseline) Memory: Only 0.28GB vs 1.0GB for FP16 (75% reduction) Speed: 3.0x faster inference than FP16 Quality: Generates correct, optimized code solutions - Pre-trained adapter: https://huggingface.co/codelion/Qwen3-0.6B-accuracy-recovery-lora - GitHub repo: https://github.com/codelion/ellora Happy to answer questions about the implementation or help anyone trying to replicate this. The key insight is that quantization errors are systematic and learnable - a small adapter can bridge the gap without negating the benefits of quantization. Has anyone else experimented with self-distillation for quantization recovery? Would love to hear about different approaches!
View all activity
Organizations
ZeroWw
's models
201
Sort: Recently updated
ZeroWw/Art-0-8B-GGUF
Text Generation
•
8B
•
Updated
10 days ago
•
148
ZeroWw/Gemma-3-R1-4B-v1-GGUF
Text Generation
•
4B
•
Updated
13 days ago
•
709
ZeroWw/Qwen3-8B-GGUF
Text Generation
•
8B
•
Updated
15 days ago
•
184
•
2
ZeroWw/Falcon-H1-7B-Instruct-GGUF
Text Generation
•
8B
•
Updated
16 days ago
•
68
ZeroWw/gemma-3-12b-it-GGUF
Text Generation
•
12B
•
Updated
24 days ago
•
765
ZeroWw/gemma-3-270m-it-GGUF
Text Generation
•
0.3B
•
Updated
25 days ago
•
578
ZeroWw/Qwen3-4B-Instruct-2507-GGUF
Text Generation
•
4B
•
Updated
27 days ago
•
629
ZeroWw/OpenELM-3B-Instruct-GGUF
Text Generation
•
3B
•
Updated
about 1 month ago
•
126
ZeroWw/Qwen3-4B-Thinking-2507-GGUF
Text Generation
•
4B
•
Updated
Aug 6
•
44
ZeroWw/Hunyuan-7B-Instruct-GGUF
Text Generation
•
8B
•
Updated
Aug 4
•
77
ZeroWw/OpenThinker3-7B-GGUF
Text Generation
•
8B
•
Updated
Jun 13
•
43
ZeroWw/medgemma-4b-it-GGUF
Text Generation
•
4B
•
Updated
Jun 13
•
22
ZeroWw/DeepSeek-R1-0528-Qwen3-8B-GGUF
Text Generation
•
8B
•
Updated
Jun 2
•
120
ZeroWw/Seed-Coder-8B-Reasoning-GGUF
Text Generation
•
8B
•
Updated
May 12
•
32
ZeroWw/Seed-Coder-8B-Instruct-GGUF
Text Generation
•
8B
•
Updated
May 12
•
31
•
1
ZeroWw/GLM-Z1-9B-0414-GGUF
Text Generation
•
9B
•
Updated
May 11
•
59
ZeroWw/Qwen3-8B-Esper3-GGUF
Text Generation
•
8B
•
Updated
May 7
•
25
ZeroWw/Qwen3-4B-Esper3-GGUF
Text Generation
•
4B
•
Updated
May 7
•
42
ZeroWw/Josiefied-Qwen3-8B-abliterated-v1-GGUF
Text Generation
•
8B
•
Updated
May 5
•
44
ZeroWw/Phi-4-mini-instruct-GGUF
Text Generation
•
4B
•
Updated
May 1
•
45
ZeroWw/Phi-4-mini-reasoning-GGUF
Text Generation
•
4B
•
Updated
May 1
•
34
ZeroWw/Qwen3-8B-abliterated-GGUF
Text Generation
•
8B
•
Updated
May 1
•
64
ZeroWw/Qwen3-4B-abliterated-GGUF
Text Generation
•
4B
•
Updated
May 1
•
100
ZeroWw/Qwen3-0.6B-GGUF
Text Generation
•
0.8B
•
Updated
Apr 29
•
24
•
1
ZeroWw/Qwen3-1.7B-GGUF
Text Generation
•
2B
•
Updated
Apr 29
•
21
ZeroWw/Qwen3-4B-GGUF
Text Generation
•
4B
•
Updated
Apr 29
•
40
•
2
ZeroWw/granite-3.3-8b-instruct-GGUF
Text Generation
•
8B
•
Updated
Apr 17
•
19
ZeroWw/granite-3.3-2b-instruct-GGUF
Text Generation
•
3B
•
Updated
Apr 16
•
17
ZeroWw/GLM-4-9B-0414-GGUF
Text Generation
•
9B
•
Updated
Apr 15
•
1
•
1
ZeroWw/Llama-3.1-Nemotron-Nano-8B-v1-GGUF
Text Generation
•
8B
•
Updated
Apr 15
•
3
Previous
1
2
3
...
7
Next