vt-gwen-2.5-3b-Q4_k_m-gguf

Developed by: vinhnx90
License: Apache-2.0
Base Model: Fine-tuned from unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit
Model Size: 3.09B parameters
Quantization: 4-bit (Q4_K_M)
Architecture: Qwen2
Primary Language: English
Frameworks: Transformers, GGUF

This Qwen2 model was trained 2× faster using Unsloth and Hugging Face’s TRL library.

Overview

vt-gwen-2.5-3b-Q4_k_m-gguf is a fine-tuned variant of the Qwen2.5 3B instruct model. It has been optimized for both speed and reasoning capability, making it well-suited for complex problem-solving and conversational applications. Enhanced with reinforcement learning using the GPRO algorithm and fine-tuned on the openai/gsm8k dataset, this model excels at mathematical and logical reasoning.

Training Details

Training Approach: Fine-tuning with Hugging Face’s TRL library in conjunction with Unsloth for accelerated training.
Dataset: Fine-tuned on the openai/gsm8k dataset to boost performance on educational, mathematical, and reasoning tasks.
Optimization: Quantized to 4-bit using the Q4_K_M format to significantly reduce memory footprint and improve inference speed.
Reinforcement Learning: Refined using RL with the GPRO algorithm to enhance instruction-following and reasoning abilities.

Intended Use

This model is best suited for:

Text Generation: Creating detailed responses, summaries, or creative narratives.
Instruction Following: Executing and elaborating on complex instructions.
Conversational Applications: Acting as a dialogue agent in research, prototyping, or interactive systems.
Reasoning Tasks: Delivering enhanced performance on mathematical, logical, and educational challenges.

Not Recommended For:

High-stakes or production environments without extensive evaluation.
Scenarios requiring absolute precision without fallback safeguards.

Limitations

Quantization Artifacts: The 4-bit quantization may introduce minor degradations compared to full-precision models.
Deployment: Currently, this model is not accessible through standard HF Inference Providers or pipelines.
Evaluation: Additional testing is recommended before deploying in critical applications.

How to Use

Using Transformers (Python)

Below is a quick code snippet using the Transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "vinhnx90/vt-gwen-2.5-3b-Q4_k_m-gguf"
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Explain the theory of relativity and provide a related mathematical example."
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
outputs = model.generate(input_ids, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Deployment with llama.cpp

For local inference using llama.cpp:

Download or convert the model: Ensure you have the GGUF file.
Run with llama.cpp:

./llama.cpp -m path/to/vt-gwen-2.5-3b-Q4_k_m-gguf.gguf -n 512 -co -sp -cnv -p "You are a helpful assistant."

Tip: Adjust the prompt and parameters as needed for your application.

Using LM Studio

LM Studio supports GGUF models:

Load the Model: Open LM Studio and select the downloaded GGUF model file.
Configure Settings: Adjust context length and performance parameters within LM Studio’s interface.
Start Inference: Use the built-in UI to interact with the model for both conversational and reasoning tasks.

Running with Ollama

For users of Ollama:

Install Ollama: Follow the instructions on Ollama's website.
Start the Ollama Service: Launch the Ollama server.
Run the Model:

ollama run hf.co/vinhnx90/vt-gwen-2.5-3b-Q4_k_m-gguf

Interact via API or UI: Once running, you can use Ollama’s interface or API to send prompts and receive responses.

Citation

If you use this model, please consider citing it as follows:

@misc{vt_gwen_2.5_3b,
  title   = {vt-gwen-2.5-3b-Q4_k_m-gguf},
  author  = {vinhnx90},
  howpublished = {\url{https://huggingface.co/vinhnx90/vt-gwen-2.5-3b-Q4_k_m-gguf}},
  license = {Apache-2.0}
}

Contact & Support

For questions, issues, or contributions, please visit the model page on Hugging Face to open an issue or contact the developer directly.

vinhnx90
/

vt-gwen-2.5-3b-Q4_k_m-gguf