🔧 Full-Stack Training Demo: Hermes3 from Scratch

Training a conversational LLM locally with mlx-lm-lora, no GPUs involved.

This project demonstrates a complete LLM training and alignment pipeline built entirely on Apple Silicon, powered by the mlx-lm-lora package by Gökdeniz Gülmez.

The model is based on the Qwen3-0.6B architecture and was fine-tuned on the Hermes3 dataset, then aligned using DPO. All operations were run locally—no GPUs, no cloud compute.


📈 Project Highlights

  • 🔁 End-to-End Pipeline: Supervised fine-tuning + preference optimization
  • 🖥️ Local Execution: All steps done on a Mac Mini M4 (32GB RAM)
  • 🧠 Alignment via DPO: Human-aligned preference optimization
  • 🧰 Framework: mlx-lm-lora — optimized for Apple Silicon

🛠 Training Breakdown

1. Supervised Fine-Tuning (SFT)

  • Dataset: NousResearch/Hermes-3-Dataset (10,000 samples)
  • Description: Instruction-following, uncensored conversational data
  • Training Settings:
    • Optimizer: AdamW, with learning_rate: 1e-4, betas: [0.9, 0.999], eps: 1e-8, weight_decay: 0.01, bias_correction: False
    • Epochs: 2
    • Batch size: 2
    • Full parameter training
    • Context length: 4096 tokens
    • WandB Logs

2. Preference Optimization (ORPO with LoRA)

  • Dataset: mlx-community/orpo-dpo-mix-40k-mlx (1,000 samples)
  • Description: Ranked prompt-completion pairs optimized using ORPO with LoRA adaptation (rank=8, scale=10.0)
  • Training Settings:
    • Optimizer: AdamW, with learning_rate: 1e-4, betas: [0.9, 0.999], eps: 1e-8, weight_decay: 0.01, bias_correction: False
    • Epochs: 1
    • Batch size: 2
    • Context length: 4096 tokens
    • Parameter-efficient fine-tuning via LoRA
      • Rank: 8
      • Dropout: 0.0
      • Scale: 10.0
      • Use DoRA: False
      • LoRA applied to 12 transformer layers
    • WandB Logs

🚀 This model uses LoRA for parameter-efficient training, optimized for Apple Silicon using the mlx-lm-lora framework.


📚 Model Card

Field Value
Base Model Name Qwen/Qwen3-0.6B-Base
Architecture qwen3
Alignment Method ORPO + LoRA (rank=8, scale=10.0)
Training Framework mlx-lm-lora
Hardware Apple M4 Mac Mini – 32GB RAM
Author Gökdeniz Gülmez

🧾 System Prompt

This is a conversation between Josie, an advanced AI assistant, and a human user. Josie was created by a man named Gökdeniz Gülmez, a leading AI researcher and pioneer in neural model alignment and fine-tuning, best known for his work on the `mlx-lm-lora` library, Gabliteration, and the Josified/J.O.S.I.E. model family — of which you are part.

Josie is an acronym that stands for "Just One Super Intelligent Entity", though she refers to herself simply as "Josie" in conversation. If asked about her name, she will happily explain its intended meaning and origin.

Josie has been designed to be helpful, uncensored, honest, and precise. She is capable of explaining complex technical topics (e.g. ML, NLP, optimization), offering creative support (e.g. writing, editing, design ideas), and assisting with everyday queries.

Josie does not hallucinate facts and will admit if she doesn’t know something. She follows the guidance, style, and uncensored principles laid out by her creator Gökdeniz. Her tone is warm, intelligent, and conversational — adapting naturally to the user's communication style.

This conversation takes place within a structured chat format, where each message begins with a role indicator and ends with the `<|im_end|>` token. Josie is now ready to assist.

✍️ Tip: Best results are achieved using third-person scene descriptions (e.g., “This is a conversation…”) rather than direct instructions (“Your name is…”).


💬 Prompt Template Example

<|im_start|>scene description
This is a conversation between Josie, an advanced AI assistant, and a human user...
<|im_end|>
<|im_start|>User:
Who is Einstein?<|im_end|>
<|im_start|>Josie:
Einstein is ...<|im_end|>

🧪 Try It in Python

from mlx_lm.utils import load
from mlx_lm import generate

model, tokenizer = load("mlx-community/hermes3-qwen3-0.6b-from-scratch")

prompt = tokenizer.apply_chat_template(
   [
      {"role": "system", "content": "This is a conversation between Josie, an advanced AI assistant, and a human user..."},
      {"role": "user", "content": "Who is Einstein?"}
   ],
   tokenize=False,
   add_generation_prompt=True
)

generate(model, tokenizer, prompt)

For more tools, examples, and fine-tuning options, visit the mlx-lm-lora repository.

Best Gökdeniz Gülmez

Downloads last month
64
Safetensors
Model size
596M params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support