Model Card for HyperCLOVAX-SEED-Text-Instruct-0.5B-GRPO-mlx

You need to use custom mlx-lm.

https://github.com/Skkuhodomo/mlx-lm/tree/grpo

Model Description

This model is a fine-tuned variant of HyperCLOVAX-SEED-Text-Instruct-0.5B, trained using the GRPO technique. It is specifically optimized for step-by-step reasoning and structured problem-solving tasks.

Intended Use

Step-by-step reasoning for math problems
Structured problem-solving with explicit reasoning process
Educational applications requiring transparent reasoning

Training Data

The model was fine-tuned on a curated dataset of problems with reasoning steps and final answers.

Performance and Limitations

Optimized for problems requiring structured reasoning
Uses and tags to show reasoning process
Uses and tags to clearly indicate final answers
May not perform optimally on tasks outside its training domain

Usage

from mlx_lm.generate import stream_generate
from mlx_lm import load, generate
model, tokenizer = load("Skkuhodomo/HyperCLOVAX-SEED-Text-Instruct-0.5B-GRPO-mlx")
prompt = """<|im_start|>system
You are given a math problem.
You MUST reason between <think> and </think>.
You MUST provide the final answer between <answer> and </answer>.
You MUST start your response with a <think> TAG.
Do NOT continue after you close </answer>.
<|im_start|>user
At 30, Anika is 4/3 the age of Maddie. What would be their average age in 15 years?<|im_end|>
<|im_start|>assistant"""

output = ""
for chunk in stream_generate(model, tokenizer, prompt, stop=["<|im_end|>"]):
    output += chunk.text
    print(chunk.text, end="", flush=True)