🤖 Qwen3-50M Storyteller

Fine-tuned version of Qwen3-50M specialized for storytelling tasks, trained on the TinyStories dataset.

📊 Training Results

Loss Metrics

Final Training Loss: 4.90833215713501
Final Validation Loss: 4.2213897705078125
Initial Validation Loss: 7.947038650512695
Loss Improvement: 3.725648880004883 (46.880970935815526% reduction)

Training Configuration

Training Epochs: 3
Learning Rate: 2e-05
Batch Size: 4
Max Sequence Length: 512 tokens
Weight Decay: 0.01
Warmup Ratio: 0.1

Model Details

Precision: FP16 (Half Precision)
Base Model: Mostafa8Mehrabi/qwen3-50m
Dataset: TinyStories
Task: Causal Language Modeling (Story Generation)

🚀 Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Mostafa8Mehrabi/qwen3-50m-storyteller")
model = AutoModelForCausalLM.from_pretrained(
    "Mostafa8Mehrabi/qwen3-50m-storyteller",
    torch_dtype=torch.float16,  # Use fp16 for efficiency
    device_map="auto"
)

# Generate a story
prompt = "<|story|>Once upon a time, there was a brave little mouse who"
inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_length=200,
        do_sample=True,
        temperature=0.8,
        top_p=0.9,
        repetition_penalty=1.1,
        pad_token_id=tokenizer.pad_token_id
    )

story = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(story)

📝 Story Format

The model expects stories to be formatted with special tokens:

Start: <|story|>
End: <|endstory|>

Example:

<|story|>Once upon a time, there was a magical forest where animals could talk...<|endstory|>

🎯 Intended Use

This model is specifically designed for:

Children's story generation
Creative writing assistance
Educational content creation
Interactive storytelling applications

⚠️ Limitations

Optimized for short stories (up to 512 tokens)
Trained primarily on simple, child-friendly narratives
May not perform well on other text generation tasks

📈 Performance

The model shows significant improvement in storytelling capability:

Validation loss reduced by 46.880970935815526% during training
Generates coherent, engaging short stories
Maintains appropriate tone and structure for children's content

Mostafa8Mehrabi
/

qwen3-50m-storyteller