vt-gwen-2.5-3b-Q4_k_m-gguf
Developed by: vinhnx90
License: Apache-2.0
Base Model: Fine-tuned from unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit
Model Size: 3.09B parameters
Quantization: 4-bit (Q4_K_M)
Architecture: Qwen2
Primary Language: English
Frameworks: Transformers, GGUF
This Qwen2 model was trained 2× faster using Unsloth and Hugging Face’s TRL library.
Overview
vt-gwen-2.5-3b-Q4_k_m-gguf is a fine-tuned variant of the Qwen2.5 3B instruct model. It has been optimized for both speed and reasoning capability, making it well-suited for complex problem-solving and conversational applications. Enhanced with reinforcement learning using the GPRO algorithm and fine-tuned on the openai/gsm8k dataset, this model excels at mathematical and logical reasoning.
Training Details
- Training Approach: Fine-tuning with Hugging Face’s TRL library in conjunction with Unsloth for accelerated training.
- Dataset: Fine-tuned on the openai/gsm8k dataset to boost performance on educational, mathematical, and reasoning tasks.
- Optimization: Quantized to 4-bit using the Q4_K_M format to significantly reduce memory footprint and improve inference speed.
- Reinforcement Learning: Refined using RL with the GPRO algorithm to enhance instruction-following and reasoning abilities.
Intended Use
This model is best suited for:
- Text Generation: Creating detailed responses, summaries, or creative narratives.
- Instruction Following: Executing and elaborating on complex instructions.
- Conversational Applications: Acting as a dialogue agent in research, prototyping, or interactive systems.
- Reasoning Tasks: Delivering enhanced performance on mathematical, logical, and educational challenges.
Not Recommended For:
- High-stakes or production environments without extensive evaluation.
- Scenarios requiring absolute precision without fallback safeguards.
Limitations
- Quantization Artifacts: The 4-bit quantization may introduce minor degradations compared to full-precision models.
- Deployment: Currently, this model is not accessible through standard HF Inference Providers or pipelines.
- Evaluation: Additional testing is recommended before deploying in critical applications.
How to Use
Using Transformers (Python)
Below is a quick code snippet using the Transformers library:
from transformers import AutoModelForCausalLM, AutoTokenizer
model_name = "vinhnx90/vt-gwen-2.5-3b-Q4_k_m-gguf"
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "Explain the theory of relativity and provide a related mathematical example."
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
outputs = model.generate(input_ids, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Deployment with llama.cpp
For local inference using llama.cpp:
Download or convert the model: Ensure you have the GGUF file.
Run with llama.cpp:
./llama.cpp -m path/to/vt-gwen-2.5-3b-Q4_k_m-gguf.gguf -n 512 -co -sp -cnv -p "You are a helpful assistant."
Tip: Adjust the prompt and parameters as needed for your application.
Using LM Studio
LM Studio supports GGUF models:
- Load the Model: Open LM Studio and select the downloaded GGUF model file.
- Configure Settings: Adjust context length and performance parameters within LM Studio’s interface.
- Start Inference: Use the built-in UI to interact with the model for both conversational and reasoning tasks.
Running with Ollama
For users of Ollama:
- Install Ollama: Follow the instructions on Ollama's website.
- Start the Ollama Service: Launch the Ollama server.
- Run the Model:
ollama run hf.co/vinhnx90/vt-gwen-2.5-3b-Q4_k_m-gguf
- Interact via API or UI: Once running, you can use Ollama’s interface or API to send prompts and receive responses.
Citation
If you use this model, please consider citing it as follows:
@misc{vt_gwen_2.5_3b,
title = {vt-gwen-2.5-3b-Q4_k_m-gguf},
author = {vinhnx90},
howpublished = {\url{https://huggingface.co/vinhnx90/vt-gwen-2.5-3b-Q4_k_m-gguf}},
license = {Apache-2.0}
}
Contact & Support
For questions, issues, or contributions, please visit the model page on Hugging Face to open an issue or contact the developer directly.
- Downloads last month
- 33