FLAN-T5 COVID-19 Vaccine Stance Classification
This repository contains my submission for the take-home coding assessment regarding the LLM Research Opportunity under Sean Yun-Shiuan Chuang, Junjie Hu, and Tim Rogers.
This model is currently public for easy visibility during the coding assessment, but will be made private afterwards.
Task Summary
Predict the stance of each tweet (in-favor
, against
, or neutral-or-unclear
) from a CSV of 5,751 tweets regarding the COVID-19 vaccination using flan-t5-large
.
Project Structure
predict.py
- Predicts the model's labels on a given dataset and saves the result into output/.eval.py
- Model evaluation.utils.py
- Shared helper functions.train.py
- Fine-tuning code on a given dataset.requirements.txt
- Package installs for reproducibility.data/
- Contains original dataset.output/
- Contains prediction output files and heldout dataset files from train/test splitting.finetune/
- contains all files of the fine-tuned model. Includes epoch checkpoint files.
Setup
Install dependencies:
pip install transformers torch pandas scikit-learn sentencepiece datasets
OR
pip install -r requirements.txt
Quick Start
To run the fine-tuned model as-is:
python3 predict.py # manually change dataset path if needed
python3 eval.py # for evaluation
Development Summary
- Initial zero-shot prompting (no fine-tuning) revealed the model never predicted
neutral-or-unclear
. Overall F1 score was 0.428. - In order to speed up initial fine-tuning on T4 GPU,
flan-t5-base
was used until final evaluations were done usingflan-t5-large
.- Initial attempt at fine-tuning (no upsampling) had poor
neutral-or-unclear
recall (0.18). Overall F1 score was 0.518. - Fine-tuning with upsampling on
neutral-or-unclear
with an 80/20 train/test split on the first 2,000 records, and then running predictions on the following 1,500 records yielded an F1 score of 0.562. (3 Epochs) - Fine-tuning with upsampling only on
neutral-or-unclear
on the entire dataset with a 80/20 train/test split showed average precision foragainst
(0.59). Overall F1 score was 0.690. (3 Epochs) - Fine-tuning with upsampling on both
neutral-or-unclear
andagainst
lead to an F1 score of 0.724. (3 Epochs)\
- Initial attempt at fine-tuning (no upsampling) had poor
- Final fine-tuning on
flan-t5-large
was done on 2 epochs in bf16 format to account for T4 GPU limitations.- Fine-tuning with
flan-t5-large
on an 80/20 split with predictions ran on the heldout dataset resulted in an F1 score of 0.782. - Final version of the fine-tuned model, with predictions run on the entirety of the original dataset provided a final F1 score of 0.772 with an accuracy of 0.801.
- Fine-tuning with
Potential Improvements
- Further tinkering with prompt could yield improved results. Brevity is a key obstacle in prompt generation.
- 3 epochs over 2 for fine-tuning
flan-t5-large
could also provide improved F1 on more powerful GPUs. - Experimentation on train/test splits.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support
Model tree for akashmohan/finetuned-flan-t5-large
Base model
google/flan-t5-large