Qwen2.5-0.5B-Instruct (Customizable Copy)

This is a copy of Qwen/Qwen2.5-0.5B-Instruct for customization and fine-tuning.

πŸ“‹ Model Details

  • Base Model: Qwen/Qwen2.5-0.5B-Instruct
  • Size: 0.5B parameters (~1GB)
  • Type: Instruction-tuned language model
  • License: Apache 2.0

🎯 Purpose

This repository contains a modifiable copy of Qwen 2.5 for:

  • Fine-tuning on custom datasets
  • Experimentation and testing
  • RunPod serverless deployment
  • Model modifications

πŸš€ Usage

Direct Inference

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "marcosremar2/runpod_serverless_n2"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "What is artificial intelligence?"
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

RunPod Serverless Deployment

Environment Variables:
  MODEL_NAME: marcosremar2/runpod_serverless_n2
  HF_TOKEN: YOUR_TOKEN_HERE
  MAX_MODEL_LEN: 4096
  TRUST_REMOTE_CODE: true

GPU: RTX 4090 (24GB)
Min Workers: 0
Max Workers: 1

πŸ”§ Fine-tuning

To fine-tune this model:

from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer

model = AutoModelForCausalLM.from_pretrained("marcosremar2/runpod_serverless_n2")
tokenizer = AutoTokenizer.from_pretrained("marcosremar2/runpod_serverless_n2")

# Your fine-tuning code here
# ...

# Push back to your repo
model.push_to_hub("marcosremar2/runpod_serverless_n2")
tokenizer.push_to_hub("marcosremar2/runpod_serverless_n2")

πŸ“Š Performance

Metric Value
Parameters 0.5B
Context Length 32K tokens
VRAM Required ~1-2GB
Inference Speed 200-300 tokens/sec (RTX 4090)

πŸ”— Original Model

This is based on: Qwen/Qwen2.5-0.5B-Instruct

For more information about the Qwen2.5 series, visit the original repository.

πŸ“„ License

Apache 2.0 - Same as the original Qwen model.

πŸ™ Credits

  • Original Model: Qwen Team @ Alibaba Cloud
  • Repository: Custom copy for modification and deployment
Downloads last month
54
Safetensors
Model size
494M params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for marcosremar2/runpod_serverless_n2

Base model

Qwen/Qwen2.5-0.5B
Finetuned
(497)
this model