📊 FinBERT Fine-Tuned on Financial News/Texts

A fine-tuned version of ProsusAI/finbert trained for financial sentiment analysis on financial news texts and headlines. This fine-tuned model achieves a significant improvement over the original finbert, outperforming it by over 38% in accuracy on financial sentiment classification tasks.

🔧 Model Objective

The goal of this model is to detect positive, neutral, or negative sentiment on financial texts and headlines.

🗂️ Training Dataset

Primary Dataset: fingpt-sentiment-train (~60,000 examples)

Labeled financial text samples (positive / neutral / negative)
Includes earnings statements, market commentary, and financial news headlines
Only included neutral, positive and negative texts.

🧪 Benchmark Evaluation

The model was evaluated against three benchmark datasets:

Metrics used:

Accuracy
F1 Score
Precision
Recall

We benchmarked this model against the original ProsusAI/finbert on multiple financial datasets:

Dataset	Samples	Model	Accuracy	F1 (Macro)	F1 (Weighted)	Precision (Macro)	Precision (Weighted)	Recall (Macro)	Recall (Weighted)
fingpt-sentiment-train Eval	12511	FinBERT	0.7131	0.70	0.71	0.71	0.72	0.70	0.71
		FinBERT-Finetuned (Ours)	0.9894 (+38.8%)	0.99 (+41.4%)	0.99 (+39.4%)	0.99 (+39.4%)	0.99 (+37.5%)	0.99 (+41.4%)	0.99 (+39.4%)
Financial Phrasebank (Agree)	2264	FinBERT	0.9717	0.96	0.97	0.95	0.97	0.98	0.97
		FinBERT-Finetuned (Ours)	0.9912 (+2.0%)	0.99 (+3.1%)	0.99 (+2.1%)	0.99 (+4.2%)	0.99 (+2.1%)	0.99 (+1.0%)	0.99 (+2.1%)
Financial Phrasebank (Combined)	14780	FinBERT	0.9238	0.91	0.92	0.89	0.93	0.94	0.92
		FinBERT-Finetuned (Ours)	0.9792 (+6.0%)	0.98 (+7.7%)	0.98 (+6.5%)	0.98 (+10.1%)	0.98 (+5.4%)	0.98 (+4.3%)	0.98 (+6.5%)
FiQA + PhraseBank (Kaggle)	5842	FinBERT	0.7581	0.74	0.77	0.73	0.79	0.77	0.76
		FinBERT-Finetuned (Ours)	0.8879 (+17.1%)	0.87 (+17.6%)	0.89 (+15.6%)	0.85 (+16.4%)	0.92 (+16.5%)	0.92 (+19.5%)	0.89 (+17.1%)

Note: All metrics represent classification performance improvements after fine-tuning FinBERT on respective financial sentiment datasets. Metrics in parentheses represent relative improvement over base FinBERT performance.

🧠 Text-Level Comparison: FinBERT vs FinBERT-Finetuned (Ours)

🔴 FinBERT Failed Texts (as per discussed in its `Paper`) (Correctly Predicted by Ours)

Text	Expected	FinBERT	Ours
Pre-tax loss totaled euro 0.3 million, compared to a loss of euro 2.2 million in the first quarter of 2005.	Positive	❌ Negative (0.7223)	✅ Positive (0.9997)
This implementation is very important to the operator, since it is about to launch its Fixed to Mobile convergence service	Neutral	❌ Positive (0.7204)	✅ Neutral (0.9998)
The situation of coated magazine printing paper will continue to be weak.	Negative	✅ Negative (0.8811)	✅ Negative (0.9996)

🟡 FinBERT Incorrect, Ours Corrected It

Text	Expected	FinBERT	Ours
The debt-to-equity ratio was 1.15, flat quarter-over-quarter.	Neutral	❌ Negative (0.6239)	✅ Neutral (0.9998)
Earnings smashed expectations $AAPL posts $0.89 EPS vs $0.78 est. Bullish momentum incoming!	Positive	❌ Neutral (0.4237)	✅ Positive (0.9998)
$TSLA growth is slowing — but hey, at least Elon tweeted something funny today. #Tesla #markets	Negative	❌ Neutral (0.5884)	✅ Negative (0.7084)

⚪ Out-of-Context Texts (FinBERT Misclassified, Ours Handled Properly)

Text	Expected	FinBERT	Ours
Unexpected Snowstorm Hits Sahara Desert, Blanketing Sand Dunes	Neutral	❌ Negative (0.8675)	✅ Neutral (0.9993)
Virtual Reality Therapy Shows Promise for Treating PTSD	Neutral	❌ Positive (0.8522)	✅ Neutral (0.9997)

Note: These examples demonstrate improvements in real-world understanding, context handling, and sentiment differentiation with our FinBERT-finetuned model. Values in parentheses (e.g., 0.9485) indicate the model’s confidence score for its predicted sentiment.

⚠️ Limitations & Failure Cases

While the model outperformed the base FinBERT across benchmarks, some failure cases were observed in statements involving fine-grained numerical reasoning, particularly when numerical comparison semantics are complex or subtle.

Text	Expected	FinBERT	Ours
Net profit to euro 203 million from euro 172 million in the previous year.	Positive	✅ Positive (0.9485)	✅ Positive (0.9995)
Net profit to euro 103 million from euro 172 million in the previous year.	Negative	❌ Positive (0.9486)	❌ Positive (0.9994)
Pre-tax loss totaled euro 0.3 million, compared to a loss of euro 2.2 million in Q1 2005.	Positive	❌ Negative (0.7223)	✅ Positive (0.9997)
Pre-tax loss totaled euro 5.3 million, compared to a loss of euro 2.2 million in Q1 2005.	Negative	✅ Negative (0.7205)	❌ Positive (0.9997)
Net profit totaled euro 5.3 million, compared to euro 2.2 million in the previous quarter of 2005.	Positive	❌ Negative (0.6347)	❌ Negative (0.9996)
Net profit totaled euro 0.3 million, compared to euro 2.2 million in the previous quarter of 2005.	Negative	✅ Negative (0.6320)	✅ Negative (0.9996)

Note: Values in parentheses (e.g., 0.9485) indicate the model’s confidence score for its predicted sentiment.

This suggests that explicit numerical comparison reasoning still remains challenging without targeted pretraining or numerical reasoning augmentation.

Hyperparameters

During fine-tuning, the following hyperparameters were used to optimize model performance:

Learning Rate: 2e-5
Batch Size: 32
Number of Epochs: 3
Max Sequence Length: 128 tokens
Optimizer: AdamW
Weight Decay: 0.01
Evaluation Strategy: Evaluation performed after each epoch

Note: These settings were chosen to balance training efficiency and accuracy for financial news sentiment classification.

💡 Summary

✅ Better generalization than FinBERT on both benchmark and noisy real-world samples
✅ Strong accuracy and F1 scores
⚠️ Room to improve on numerical reasoning comparisons — potential for integration with numerical-aware transformers or contrastive fine-tuning

Usage

Pipeline Approach

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
import torch

model_name = "project-aps/finbert-finetune"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Override the config's id2label and label2id
label_map = {0: "neutral", 1: "negative", 2: "positive"}
model.config.id2label = label_map
model.config.label2id = {v: k for k, v in label_map.items()}

pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)

text = "Earnings smashed expectations AAPL posts $0.89 EPS vs $0.78 est. Bullish momentum incoming! #EarningsSeason"
print(pipe(text)) #Output: [{'label': 'positive', 'score': 0.9997484087944031}]

Simple Approach

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "project-aps/finbert-finetune"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

text = "Earnings smashed expectations AAPL posts $0.89 EPS vs $0.78 est. Bullish momentum incoming! #EarningsSeason"
inputs = tokenizer(text, return_tensors="pt", truncation=True)
outputs = model(**inputs)
predicted_class = torch.argmax(outputs.logits, dim=1).item()

label_map = {0: "neutral", 1: "negative", 2: "positive"}
print(f"Text : {text}")
print(f"Sentiment: {label_map[predicted_class]}")

Acknowledgements

We gratefully acknowledge the creators and maintainers of the resources used in this project:

ProsusAI/FinBERT – A pre-trained BERT model specifically designed for financial sentiment analysis, which served as the foundation for our fine-tuning efforts.
FinGPT Sentiment Train Dataset – The dataset used for fine-tuning, containing a large collection of finance-related news headlines and sentiment annotations.
Financial PhraseBank Dataset – A widely used benchmark dataset for financial sentiment classification, including the All Agree and All Combined subsets.
FiQA + PhraseBank Kaggle Merged Dataset – A merged dataset combining FiQA and Financial PhraseBank entries, used for broader benchmarking of sentiment performance.

We thank these contributors for making their models and datasets publicly available, enabling high-quality research and development in financial NLP.

Downloads last month: 190

Safetensors

Model size

0.1B params

Tensor type

F32

Model tree for project-aps/finbert-finetune

Base model

ProsusAI/finbert

Finetuned

(73)

this model

project-aps
/

finbert-finetune

📊 FinBERT Fine-Tuned on Financial News/Texts

🔧 Model Objective

🗂️ Training Dataset

🧪 Benchmark Evaluation

🧠 Text-Level Comparison: FinBERT vs FinBERT-Finetuned (Ours)

🔴 FinBERT Failed Texts (as per discussed in its `Paper`) (Correctly Predicted by Ours)

🟡 FinBERT Incorrect, Ours Corrected It

⚪ Out-of-Context Texts (FinBERT Misclassified, Ours Handled Properly)

⚠️ Limitations & Failure Cases

Hyperparameters

💡 Summary

Usage

Pipeline Approach

Simple Approach

Acknowledgements

Model tree for project-aps/finbert-finetune

Dataset used to train project-aps/finbert-finetune

📊 FinBERT Fine-Tuned on Financial News/Texts

🔧 Model Objective

🗂️ Training Dataset

🧪 Benchmark Evaluation

🧠 Text-Level Comparison: FinBERT vs FinBERT-Finetuned (Ours)

🔴 FinBERT Failed Texts (as per discussed in its Paper) (Correctly Predicted by Ours)

🟡 FinBERT Incorrect, Ours Corrected It

⚪ Out-of-Context Texts (FinBERT Misclassified, Ours Handled Properly)

⚠️ Limitations & Failure Cases

Hyperparameters

💡 Summary

Usage

Pipeline Approach

Simple Approach

Acknowledgements

Model tree for project-aps/finbert-finetune

Dataset used to train project-aps/finbert-finetune

🔴 FinBERT Failed Texts (as per discussed in its `Paper`) (Correctly Predicted by Ours)