πŸ“Š FinBERT Fine-Tuned on Financial News/Texts

A fine-tuned version of ProsusAI/finbert trained for financial sentiment analysis on financial news texts and headlines. This fine-tuned model achieves a significant improvement over the original finbert, outperforming it by over 38% in accuracy on financial sentiment classification tasks.


πŸ”§ Model Objective

The goal of this model is to detect positive, neutral, or negative sentiment on financial texts and headlines.


πŸ—‚οΈ Training Dataset

Primary Dataset: fingpt-sentiment-train (~60,000 examples)

  • Labeled financial text samples (positive / neutral / negative)
  • Includes earnings statements, market commentary, and financial news headlines
  • Only included neutral, positive and negative texts.

πŸ§ͺ Benchmark Evaluation

The model was evaluated against three benchmark datasets:

Metrics used:

  • Accuracy
  • F1 Score
  • Precision
  • Recall

We benchmarked this model against the original ProsusAI/finbert on multiple financial datasets:

Dataset Samples Model Accuracy F1 (Macro) F1 (Weighted) Precision (Macro) Precision (Weighted) Recall (Macro) Recall (Weighted)
fingpt-sentiment-train Eval 12511 FinBERT 0.7131 0.70 0.71 0.71 0.72 0.70 0.71
FinBERT-Finetuned (Ours) 0.9894 (+38.8%) 0.99 (+41.4%) 0.99 (+39.4%) 0.99 (+39.4%) 0.99 (+37.5%) 0.99 (+41.4%) 0.99 (+39.4%)
Financial Phrasebank (Agree) 2264 FinBERT 0.9717 0.96 0.97 0.95 0.97 0.98 0.97
FinBERT-Finetuned (Ours) 0.9912 (+2.0%) 0.99 (+3.1%) 0.99 (+2.1%) 0.99 (+4.2%) 0.99 (+2.1%) 0.99 (+1.0%) 0.99 (+2.1%)
Financial Phrasebank (Combined) 14780 FinBERT 0.9238 0.91 0.92 0.89 0.93 0.94 0.92
FinBERT-Finetuned (Ours) 0.9792 (+6.0%) 0.98 (+7.7%) 0.98 (+6.5%) 0.98 (+10.1%) 0.98 (+5.4%) 0.98 (+4.3%) 0.98 (+6.5%)
FiQA + PhraseBank (Kaggle) 5842 FinBERT 0.7581 0.74 0.77 0.73 0.79 0.77 0.76
FinBERT-Finetuned (Ours) 0.8879 (+17.1%) 0.87 (+17.6%) 0.89 (+15.6%) 0.85 (+16.4%) 0.92 (+16.5%) 0.92 (+19.5%) 0.89 (+17.1%)

Note: All metrics represent classification performance improvements after fine-tuning FinBERT on respective financial sentiment datasets. Metrics in parentheses represent relative improvement over base FinBERT performance.


🧠 Text-Level Comparison: FinBERT vs FinBERT-Finetuned (Ours)

πŸ”΄ FinBERT Failed Texts (as per discussed in its Paper) (Correctly Predicted by Ours)

Text Expected FinBERT Ours
Pre-tax loss totaled euro 0.3 million, compared to a loss of euro 2.2 million in the first quarter of 2005. Positive ❌ Negative (0.7223) βœ… Positive (0.9997)
This implementation is very important to the operator, since it is about to launch its Fixed to Mobile convergence service Neutral ❌ Positive (0.7204) βœ… Neutral (0.9998)
The situation of coated magazine printing paper will continue to be weak. Negative βœ… Negative (0.8811) βœ… Negative (0.9996)

🟑 FinBERT Incorrect, Ours Corrected It

Text Expected FinBERT Ours
The debt-to-equity ratio was 1.15, flat quarter-over-quarter. Neutral ❌ Negative (0.6239) βœ… Neutral (0.9998)
Earnings smashed expectations $AAPL posts $0.89 EPS vs $0.78 est. Bullish momentum incoming! Positive ❌ Neutral (0.4237) βœ… Positive (0.9998)
$TSLA growth is slowing β€” but hey, at least Elon tweeted something funny today. #Tesla #markets Negative ❌ Neutral (0.5884) βœ… Negative (0.7084)

βšͺ Out-of-Context Texts (FinBERT Misclassified, Ours Handled Properly)

Text Expected FinBERT Ours
Unexpected Snowstorm Hits Sahara Desert, Blanketing Sand Dunes Neutral ❌ Negative (0.8675) βœ… Neutral (0.9993)
Virtual Reality Therapy Shows Promise for Treating PTSD Neutral ❌ Positive (0.8522) βœ… Neutral (0.9997)

Note: These examples demonstrate improvements in real-world understanding, context handling, and sentiment differentiation with our FinBERT-finetuned model. Values in parentheses (e.g., 0.9485) indicate the model’s confidence score for its predicted sentiment.


⚠️ Limitations & Failure Cases

While the model outperformed the base FinBERT across benchmarks, some failure cases were observed in statements involving fine-grained numerical reasoning, particularly when numerical comparison semantics are complex or subtle.

Text Expected FinBERT Ours
Net profit to euro 203β€―million from euro 172β€―million in the previous year. Positive βœ… Positive (0.9485) βœ… Positive (0.9995)
Net profit to euro 103β€―million from euro 172β€―million in the previous year. Negative ❌ Positive (0.9486) ❌ Positive (0.9994)
Pre-tax loss totaled euro 0.3 million, compared to a loss of euro 2.2 million in Q1 2005. Positive ❌ Negative (0.7223) βœ… Positive (0.9997)
Pre-tax loss totaled euro 5.3 million, compared to a loss of euro 2.2 million in Q1 2005. Negative βœ… Negative (0.7205) ❌ Positive (0.9997)
Net profit totaled euro 5.3 million, compared to euro 2.2 million in the previous quarter of 2005. Positive ❌ Negative (0.6347) ❌ Negative (0.9996)
Net profit totaled euro 0.3 million, compared to euro 2.2 million in the previous quarter of 2005. Negative βœ… Negative (0.6320) βœ… Negative (0.9996)

Note: Values in parentheses (e.g., 0.9485) indicate the model’s confidence score for its predicted sentiment.

This suggests that explicit numerical comparison reasoning still remains challenging without targeted pretraining or numerical reasoning augmentation.


Hyperparameters

During fine-tuning, the following hyperparameters were used to optimize model performance:

  • Learning Rate: 2e-5
  • Batch Size: 32
  • Number of Epochs: 3
  • Max Sequence Length: 128 tokens
  • Optimizer: AdamW
  • Weight Decay: 0.01
  • Evaluation Strategy: Evaluation performed after each epoch

Note: These settings were chosen to balance training efficiency and accuracy for financial news sentiment classification.


πŸ’‘ Summary

βœ… Better generalization than FinBERT on both benchmark and noisy real-world samples
βœ… Strong accuracy and F1 scores
⚠️ Room to improve on numerical reasoning comparisons β€” potential for integration with numerical-aware transformers or contrastive fine-tuning


Usage

Pipeline Approach

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
import torch

model_name = "project-aps/finbert-finetune"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

# Override the config's id2label and label2id
label_map = {0: "neutral", 1: "negative", 2: "positive"}
model.config.id2label = label_map
model.config.label2id = {v: k for k, v in label_map.items()}

pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)

text = "Earnings smashed expectations AAPL posts $0.89 EPS vs $0.78 est. Bullish momentum incoming! #EarningsSeason"
print(pipe(text)) #Output: [{'label': 'positive', 'score': 0.9997484087944031}]

Simple Approach

from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

model_name = "project-aps/finbert-finetune"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

text = "Earnings smashed expectations AAPL posts $0.89 EPS vs $0.78 est. Bullish momentum incoming! #EarningsSeason"
inputs = tokenizer(text, return_tensors="pt", truncation=True)
outputs = model(**inputs)
predicted_class = torch.argmax(outputs.logits, dim=1).item()

label_map = {0: "neutral", 1: "negative", 2: "positive"}
print(f"Text : {text}")
print(f"Sentiment: {label_map[predicted_class]}")

Acknowledgements

We gratefully acknowledge the creators and maintainers of the resources used in this project:

  • ProsusAI/FinBERT – A pre-trained BERT model specifically designed for financial sentiment analysis, which served as the foundation for our fine-tuning efforts.

  • FinGPT Sentiment Train Dataset – The dataset used for fine-tuning, containing a large collection of finance-related news headlines and sentiment annotations.

  • Financial PhraseBank Dataset – A widely used benchmark dataset for financial sentiment classification, including the All Agree and All Combined subsets.

  • FiQA + PhraseBank Kaggle Merged Dataset – A merged dataset combining FiQA and Financial PhraseBank entries, used for broader benchmarking of sentiment performance.

We thank these contributors for making their models and datasets publicly available, enabling high-quality research and development in financial NLP.


Downloads last month
148
Safetensors
Model size
109M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ 3 Ask for provider support

Model tree for project-aps/finbert-finetune

Base model

ProsusAI/finbert
Finetuned
(65)
this model

Dataset used to train project-aps/finbert-finetune