vt-gwen-2.5-3b-Q4_k_m-gguf

Developed by: vinhnx90
License: Apache-2.0
Base Model: Fine-tuned from unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit
Model Size: 3.09B parameters
Quantization: 4-bit (Q4_K_M)
Architecture: Qwen2
Primary Language: English
Frameworks: Transformers, GGUF

This Qwen2 model was trained 2× faster using Unsloth and Hugging Face’s TRL library.


Overview

vt-gwen-2.5-3b-Q4_k_m-gguf is a fine-tuned variant of the Qwen2.5 3B instruct model. It has been optimized for both speed and reasoning capability, making it well-suited for complex problem-solving and conversational applications. Enhanced with reinforcement learning using the GPRO algorithm and fine-tuned on the openai/gsm8k dataset, this model excels at mathematical and logical reasoning.


Training Details

  • Training Approach: Fine-tuning with Hugging Face’s TRL library in conjunction with Unsloth for accelerated training.
  • Dataset: Fine-tuned on the openai/gsm8k dataset to boost performance on educational, mathematical, and reasoning tasks.
  • Optimization: Quantized to 4-bit using the Q4_K_M format to significantly reduce memory footprint and improve inference speed.
  • Reinforcement Learning: Refined using RL with the GPRO algorithm to enhance instruction-following and reasoning abilities.

Intended Use

This model is best suited for:

  • Text Generation: Creating detailed responses, summaries, or creative narratives.
  • Instruction Following: Executing and elaborating on complex instructions.
  • Conversational Applications: Acting as a dialogue agent in research, prototyping, or interactive systems.
  • Reasoning Tasks: Delivering enhanced performance on mathematical, logical, and educational challenges.

Not Recommended For:

  • High-stakes or production environments without extensive evaluation.
  • Scenarios requiring absolute precision without fallback safeguards.

Limitations

  • Quantization Artifacts: The 4-bit quantization may introduce minor degradations compared to full-precision models.
  • Deployment: Currently, this model is not accessible through standard HF Inference Providers or pipelines.
  • Evaluation: Additional testing is recommended before deploying in critical applications.

How to Use

Using Transformers (Python)

Below is a quick code snippet using the Transformers library:

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "vinhnx90/vt-gwen-2.5-3b-Q4_k_m-gguf"
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)

prompt = "Explain the theory of relativity and provide a related mathematical example."
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
outputs = model.generate(input_ids, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Deployment with llama.cpp

For local inference using llama.cpp:

  1. Download or convert the model: Ensure you have the GGUF file.

  2. Run with llama.cpp:

./llama.cpp -m path/to/vt-gwen-2.5-3b-Q4_k_m-gguf.gguf -n 512 -co -sp -cnv -p "You are a helpful assistant."

Tip: Adjust the prompt and parameters as needed for your application.

Using LM Studio

LM Studio supports GGUF models:

  1. Load the Model: Open LM Studio and select the downloaded GGUF model file.
  2. Configure Settings: Adjust context length and performance parameters within LM Studio’s interface.
  3. Start Inference: Use the built-in UI to interact with the model for both conversational and reasoning tasks.

Running with Ollama

For users of Ollama:

  1. Install Ollama: Follow the instructions on Ollama's website.
  2. Start the Ollama Service: Launch the Ollama server.
  3. Run the Model:
ollama run hf.co/vinhnx90/vt-gwen-2.5-3b-Q4_k_m-gguf
  1. Interact via API or UI: Once running, you can use Ollama’s interface or API to send prompts and receive responses.

Citation

If you use this model, please consider citing it as follows:

@misc{vt_gwen_2.5_3b,
  title   = {vt-gwen-2.5-3b-Q4_k_m-gguf},
  author  = {vinhnx90},
  howpublished = {\url{https://huggingface.co/vinhnx90/vt-gwen-2.5-3b-Q4_k_m-gguf}},
  license = {Apache-2.0}
}

Contact & Support

For questions, issues, or contributions, please visit the model page on Hugging Face to open an issue or contact the developer directly.

Downloads last month
33
GGUF
Model size
3.09B params
Architecture
qwen2

4-bit

Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no pipeline_tag.

Dataset used to train vinhnx90/vt-gwen-2.5-3b-Q4_k_m-gguf

Collection including vinhnx90/vt-gwen-2.5-3b-Q4_k_m-gguf