Qwen2.5-0.5B-Instruct (Customizable Copy)

This is a copy of Qwen/Qwen2.5-0.5B-Instruct for customization and fine-tuning.

📋 Model Details

Base Model: Qwen/Qwen2.5-0.5B-Instruct
Size: 0.5B parameters (~1GB)
Type: Instruction-tuned language model
License: Apache 2.0

🎯 Purpose

This repository contains a modifiable copy of Qwen 2.5 for:

Fine-tuning on custom datasets
Experimentation and testing
RunPod serverless deployment
Model modifications

🚀 Usage

Direct Inference

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "marcosremar2/runpod_serverless_n2"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "What is artificial intelligence?"
messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": prompt}
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=512
)

response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

RunPod Serverless Deployment

Environment Variables:
  MODEL_NAME: marcosremar2/runpod_serverless_n2
  HF_TOKEN: YOUR_TOKEN_HERE
  MAX_MODEL_LEN: 4096
  TRUST_REMOTE_CODE: true

GPU: RTX 4090 (24GB)
Min Workers: 0
Max Workers: 1

🔧 Fine-tuning

To fine-tune this model:

from transformers import AutoModelForCausalLM, AutoTokenizer, TrainingArguments, Trainer

model = AutoModelForCausalLM.from_pretrained("marcosremar2/runpod_serverless_n2")
tokenizer = AutoTokenizer.from_pretrained("marcosremar2/runpod_serverless_n2")

# Your fine-tuning code here
# ...

# Push back to your repo
model.push_to_hub("marcosremar2/runpod_serverless_n2")
tokenizer.push_to_hub("marcosremar2/runpod_serverless_n2")

📊 Performance

Metric	Value
Parameters	0.5B
Context Length	32K tokens
VRAM Required	~1-2GB
Inference Speed	200-300 tokens/sec (RTX 4090)

🔗 Original Model

This is based on: Qwen/Qwen2.5-0.5B-Instruct

For more information about the Qwen2.5 series, visit the original repository.

📄 License

Apache 2.0 - Same as the original Qwen model.

🙏 Credits

Original Model: Qwen Team @ Alibaba Cloud
Repository: Custom copy for modification and deployment

Downloads last month: 54

Safetensors

Model size

494M params

Tensor type

BF16

Model tree for marcosremar2/runpod_serverless_n2

Base model

Qwen/Qwen2.5-0.5B

Finetuned

Qwen/Qwen2.5-0.5B-Instruct

Finetuned

(497)

this model