๐Ÿง  Model Card: Daemontatox/Droidz

Daemontatox/Droidz is a highly-optimized, compact language model built on top of unsloth/qwen3-1.7b, engineered for fast, intelligent inference on consumer-grade devices. It's part of an ongoing research effort to close the performance gap between small and large language models using architectural efficiency, reflective reasoning techniques, and lightweight distributed training.


๐Ÿงฌ Objective

The goal of Droidz is to:

  • Achieve close-to-7B model quality with <2B parameter models.
  • Support edge deployment: mobile, CPU, small GPU.
  • Provide accurate, fast, reflective generation in constrained environments.
  • Enable scalable fine-tuning through efficient, distributed training pipelines.

๐Ÿ› ๏ธ Model Overview

Field Detail
Base model unsloth/qwen3-1.7b
Architecture Transformer, Qwen3-architecture (2.7x faster rope)
Finetuned on Proprietary curated instruction + reasoning dataset
Training Method Distributed LoRA + Flash-Attn2 + PEFT + DDP
Model Size ~1.7B params
Precision bfloat16 (training), supports int4/int8 (inference)
Language English only (monolingual)
License Apache-2.0
Intended Use Conversational AI, edge agents, assistants, embedded systems

๐Ÿ—๏ธ Training Details

Training Infrastructure

  • Frameworks: transformers, unsloth, accelerate, PEFT
  • Backends: Fully-distributed with DeepSpeed Zero 2, DDP, fsdp, and Flash Attention v2
  • Devices: A100 (80GB), RTX 3090 clusters, TPU v5e (mixed)
  • Optimizer: AdamW + Cosine LR schedule + Warmup steps
  • Batching: Dynamic packing enabled, up to 2048 context tokens
  • Checkpointing: Async gradient checkpointing for memory efficiency
  • Duration: ~1.2M steps across multiple domains

Finetuning Methodology

  • Reflection prompting: Models are trained to self-verify and revise outputs.
  • Instruction tuning: Curated prompt-response pairs across diverse reasoning domains.
  • Multi-domain generalization: Code, logic puzzles, philosophy, and conversational tasks.
  • Optimization: Gradient accumulation + progressive layer freezing.

๐Ÿ”ฎ Example Use Cases

  • Conversational AI for mobile and web apps
  • Offline reasoning agents (Raspberry Pi, Jetson Nano, etc.)
  • Embedded chatbots with local-only privacy
  • Edge-side logic assistants for industry-specific workflows
  • Autonomous tools for summarization, code suggestion, self-verification

โšก Inference Code

from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer

model_id = "Daemontatox/Droidz"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",  # or {"": "cuda:0"} for manual
    torch_dtype="auto"  # uses bf16/fp16 if available
)

streamer = TextStreamer(tokenizer)

prompt = "Explain the concept of reinforcement learning simply."

inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
_ = model.generate(**inputs, max_new_tokens=200, streamer=streamer)

๐Ÿงช Performance Benchmarks

Hardware Mode Throughput VRAM / RAM Notes
RTX 3060 12GB (FP16) Transformers ~37 tokens/s ~5.1 GB Good for batch inference
MacBook M2 (Metal backend) Transformers ~23 tokens/s ~3.6 GB Works well on 8-core M2
Intel i7-12700H (CPU-only) GGUF (Q4) ~8 tokens/s ~4.1 GB Llama.cpp via llm or Koboldcpp
Jetson Orin Nano (8GB) INT4 GGUF ~6 tokens/s ~3.2 GB Embedded/IoT ready

๐Ÿง  Prompt Samples

โ“ Prompt: "What is backpropagation in neural networks?"

Backpropagation is a training algorithm that adjusts a neural networkโ€™s weights by computing gradients of error from output to input layers using the chain rule. Itโ€™s the core of how neural networks learn.

๐Ÿ”ง Prompt: "Fix the bug: `print('Score:' + 100)"

Youโ€™re trying to concatenate a string with an integer. Use: print('Score:' + str(100))

๐Ÿ” Prompt: "Summarize the Stoic concept of control."

Stoics believe in focusing only on what you can controlโ€”your actions and thoughtsโ€”while accepting what you cannot control with calm detachment.


๐Ÿ” Quantization Support (Deployment-Ready)

Format Status Tool Notes
GGUF โœ… Stable llama.cpp Works on CPUs, Android, Web
GPTQ โœ… Stable AutoGPTQ For fast GPU inference
AWQ โœ… Tested AutoAWQ 4-bit low-latency inference
FP16 โœ… Native Transformers RTX/Apple Metal ready
bfloat16 โœ… Transformers For A100/TPU-friendly runs

๐Ÿงฑ Architecture Enhancements

  • FlashAttention2: Fused softmax and dropout for 2โ€“3x attention speed boost.
  • Unsloโ€ h Patch: Accelerated training/inference kernel replacements
  • Rope Scaling: Extended context window support for long-input reasoning
  • Rotary Embedding Interpolation: Improves generalization beyond pretraining length
  • LayerDrop + Activation Checkpointing: For ultra-efficient memory training

โœ… Intended Use

Use Case Suitable
Local chatbots / assistants โœ…
Developer coding copilots โœ…
Offline reasoning agents โœ…
Educational agents โœ…
Legal / financial advisors โŒ
Medical diagnosis โŒ

Model is not suitable for domains where accuracy or factual correctness is critical without verification.


๐Ÿšซ Known Limitations

  • Context length currently capped at 2048 (can be increased via RoPE interpolation).
  • Struggles with long-form generation (>1024 tokens).
  • Not multilingual (yet).
  • Sensitive to prompt phrasing without CoT or self-correction format.

๐Ÿ“ Roadmap

  • Expand to multilingual support via cross-lingual bootstrapping.
  • Integrate Mamba-style recurrence for long-context inference.
  • Release optimized GGUF + quantized weights for browser/Android.
  • Explore retrieval-augmented reflection (RAR) capabilities.

๐Ÿ‘จโ€๐Ÿ’ป Author

  • Name: Daemontatox
  • Affiliation: Independent Researcher
  • Contact: HuggingFace Profile
  • Focus: LLM compression, theory of mind, agent intelligence on the edge

๐Ÿ“– Citation

@misc{daemontatox2025droidz,
  title={Droidz: A Fast, Reflective Small Language Model for Reasoning on Edge Devices},
  author={Daemontatox},
  year={2025},
  howpublished={\url{https://huggingface.co/Daemontatox/Droidz}},
  note={Ongoing Research}
}
Downloads last month
22
Safetensors
Model size
1.72B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Daemontatox/Droidz

Quantizations
2 models