Model Card: QLoRA Fine-Tuned Sentiment Classifier for Tweets
Model Information
Model Name: TinyLlama-1.1B-Chat (QLoRA Fine-Tuned)
Base Model: TinyLlama/TinyLlama-1.1B-Chat-v1.0
Task: Sentiment Analysis – Tweet Classification
Frameworks Used: HuggingFace Transformers, PEFT (QLoRA)
Hardware: Google Colab T4 GPU
Repository: https://huggingface.co/YourUsername/llm-course-hw3-tinyllama-qlora
Model Description
This model is a QLoRA fine-tuned version of TinyLlama-1.1B-Chat, specifically adapted for tweet sentiment classification. QLoRA leverages 4-bit quantization with the BitsAndBytes library, significantly reducing GPU memory usage while retaining performance. The fine-tuning is performed with the PEFT framework using a LoRA adapter on quantized weights. The base model weights remain frozen, and only the LoRA parameters are updated during training. This adaptation allows the model to accurately classify tweets into negative, neutral, or positive categories despite the limited computational budget.
Results
The QLoRA training configuration was designed to ensure a balance between memory efficiency and adaptation capacity. Using quantization and targeted LoRA fine-tuning, the model's macro F1 score improved substantially from 0.14 before fine-tuning to 0.51 after fine-tuning.
Two confusion matrix heatmaps illustrate this performance shift:
- Heatmap Before Fine-Tuning:
- Heatmap After Fine-Tuning:
Sample Generation Outputs
The following examples demonstrate the model's sentiment classification capability:
Tweet:
"QT @user In the original draft of the 7th book, Remus Lupin survived the Battle of Hogwarts. #HappyBirthdayRemusLupin"
Expected Label: positive
Model Generation: positiveTweet:
"Chase Headley's RBI double in the 8th inning off David Price snapped a Yankees streak of 33 consecutive scoreless innings against Blue Jays"
Expected Label: neutral
Model Generation: neutralTweet:
"@user Alciato: Bee will invest 150 million in January, another 200 in the Summer and plans to bring Messi by 2017"
Expected Label: positive
Model Generation: positive
Experiment and Training Details
Data Preparation:
The training data was sourced from thecardiffnlp/tweet_eval
dataset. Each tweet was converted into a conversational prompt comprising a system instruction, a user message with the tweet text, and an assistant message with the corresponding sentiment label.Model Adaptation:
The QLoRA methodology was implemented by first quantizing the base model (using 4-bit precision withbnb_4bit_quant_type="nf4"
and compute dtype set totorch.float16
). LoRA adapters with a configuration of:- lora_alpha: 32
- lora_dropout: 0.1
- r: 16
- Target Modules: ["q_proj", "k_proj", "v_proj", "o_proj"]
were then applied to the quantized model. Only the LoRA parameters were updated during training, while the original model weights remained frozen.
Training Process:
The QLoRA fine-tuning was carried out using the SFTTrainer. Key training parameters included:- Learning Rate: 2e-4
- Number of Epochs: 3
- Per Device Batch Size: 8
- Gradient Accumulation Steps: 4
- LR Scheduler: constant_with_warmup
The training loop was optimized for efficient memory usage on a T4 GPU in Google Colab. This approach ensured that the quantized model, despite its reduced precision, was effectively fine-tuned with minimal resource overhead.
Evaluation:
Evaluation was performed on a held-out test set, with macro F1 score serving as the primary metric. Detailed confusion matrix heatmaps (provided above as placeholders) demonstrate the performance improvements after fine-tuning.Libraries and Tools:
The process utilized HuggingFace Transformers, the PEFT library for QLoRA implementation, and BitsAndBytes for efficient 4-bit quantization.
Repository & Experiment Links
Model Repository:: https://huggingface.co/estnafinema0/llm-course-hw3-tinyllama-qlora
Conclusion
This model card documents the successful fine-tuning of a quantized TinyLlama-1.1B-Chat model using the QLoRA method for tweet sentiment classification. By combining efficient 4-bit quantization with LoRA adapters, the model achieved a significant performance improvement, with the macro F1 score rising from 0.14 to 0.51. The experimental setup, executed on Google Colab using T4 GPUs, demonstrates that resource-constrained environments can yield competitive results through modern fine-tuning techniques. Detailed evaluation visualizations and configuration details are provided in the repository for further examination.
For any inquiries or additional information, please refer to the repository or contact the model maintainer on [email protected]
.
Model tree for estnafinema0/llm-course-hw3-tinyllama-qlora
Base model
TinyLlama/TinyLlama-1.1B-Chat-v1.0