πŸ€– Qwen3-50M Storyteller

Fine-tuned version of Qwen3-50M specialized for storytelling tasks, trained on the TinyStories dataset.

πŸ“Š Training Results

Loss Metrics

  • Final Training Loss: 4.90833215713501
  • Final Validation Loss: 4.2213897705078125
  • Initial Validation Loss: 7.947038650512695
  • Loss Improvement: 3.725648880004883 (46.880970935815526% reduction)

Training Configuration

  • Training Epochs: 3
  • Learning Rate: 2e-05
  • Batch Size: 4
  • Max Sequence Length: 512 tokens
  • Weight Decay: 0.01
  • Warmup Ratio: 0.1

Model Details

  • Precision: FP16 (Half Precision)
  • Base Model: Mostafa8Mehrabi/qwen3-50m
  • Dataset: TinyStories
  • Task: Causal Language Modeling (Story Generation)

πŸš€ Usage

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("Mostafa8Mehrabi/qwen3-50m-storyteller")
model = AutoModelForCausalLM.from_pretrained(
    "Mostafa8Mehrabi/qwen3-50m-storyteller",
    torch_dtype=torch.float16,  # Use fp16 for efficiency
    device_map="auto"
)

# Generate a story
prompt = "<|story|>Once upon a time, there was a brave little mouse who"
inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        **inputs,
        max_length=200,
        do_sample=True,
        temperature=0.8,
        top_p=0.9,
        repetition_penalty=1.1,
        pad_token_id=tokenizer.pad_token_id
    )

story = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(story)

πŸ“ Story Format

The model expects stories to be formatted with special tokens:

  • Start: <|story|>
  • End: <|endstory|>

Example:

<|story|>Once upon a time, there was a magical forest where animals could talk...<|endstory|>

🎯 Intended Use

This model is specifically designed for:

  • Children's story generation
  • Creative writing assistance
  • Educational content creation
  • Interactive storytelling applications

⚠️ Limitations

  • Optimized for short stories (up to 512 tokens)
  • Trained primarily on simple, child-friendly narratives
  • May not perform well on other text generation tasks

πŸ“ˆ Performance

The model shows significant improvement in storytelling capability:

  • Validation loss reduced by 46.880970935815526% during training
  • Generates coherent, engaging short stories
  • Maintains appropriate tone and structure for children's content
Downloads last month
3
Safetensors
Model size
71.6M params
Tensor type
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Mostafa8Mehrabi/qwen3-50m-storyteller

Finetuned
Qwen/Qwen3-0.6B
Finetuned
(1)
this model
Quantizations
1 model