π FinBERT Fine-Tuned on Financial News/Texts
A fine-tuned version of ProsusAI/finbert
trained for financial sentiment analysis on financial news texts and headlines.
This fine-tuned model achieves a significant improvement over the original finbert, outperforming it by over 38% in accuracy on financial sentiment classification tasks.
π§ Model Objective
The goal of this model is to detect positive, neutral, or negative sentiment on financial texts and headlines.
ποΈ Training Dataset
Primary Dataset: fingpt-sentiment-train
(~60,000 examples)
- Labeled financial text samples (positive / neutral / negative)
- Includes earnings statements, market commentary, and financial news headlines
- Only included neutral, positive and negative texts.
π§ͺ Benchmark Evaluation
The model was evaluated against three benchmark datasets:
- Financial PhraseBank (All Agree and All Combined)
- FiQA + PhraseBank Kaggle Merge
- fingpt-sentiment-train (test split)
Metrics used:
- Accuracy
- F1 Score
- Precision
- Recall
We benchmarked this model against the original ProsusAI/finbert
on multiple financial datasets:
Dataset | Samples | Model | Accuracy | F1 (Macro) | F1 (Weighted) | Precision (Macro) | Precision (Weighted) | Recall (Macro) | Recall (Weighted) |
---|---|---|---|---|---|---|---|---|---|
fingpt-sentiment-train Eval | 12511 | FinBERT | 0.7131 | 0.70 | 0.71 | 0.71 | 0.72 | 0.70 | 0.71 |
FinBERT-Finetuned (Ours) | 0.9894 (+38.8%) | 0.99 (+41.4%) | 0.99 (+39.4%) | 0.99 (+39.4%) | 0.99 (+37.5%) | 0.99 (+41.4%) | 0.99 (+39.4%) | ||
Financial Phrasebank (Agree) | 2264 | FinBERT | 0.9717 | 0.96 | 0.97 | 0.95 | 0.97 | 0.98 | 0.97 |
FinBERT-Finetuned (Ours) | 0.9912 (+2.0%) | 0.99 (+3.1%) | 0.99 (+2.1%) | 0.99 (+4.2%) | 0.99 (+2.1%) | 0.99 (+1.0%) | 0.99 (+2.1%) | ||
Financial Phrasebank (Combined) | 14780 | FinBERT | 0.9238 | 0.91 | 0.92 | 0.89 | 0.93 | 0.94 | 0.92 |
FinBERT-Finetuned (Ours) | 0.9792 (+6.0%) | 0.98 (+7.7%) | 0.98 (+6.5%) | 0.98 (+10.1%) | 0.98 (+5.4%) | 0.98 (+4.3%) | 0.98 (+6.5%) | ||
FiQA + PhraseBank (Kaggle) | 5842 | FinBERT | 0.7581 | 0.74 | 0.77 | 0.73 | 0.79 | 0.77 | 0.76 |
FinBERT-Finetuned (Ours) | 0.8879 (+17.1%) | 0.87 (+17.6%) | 0.89 (+15.6%) | 0.85 (+16.4%) | 0.92 (+16.5%) | 0.92 (+19.5%) | 0.89 (+17.1%) |
Note: All metrics represent classification performance improvements after fine-tuning FinBERT on respective financial sentiment datasets. Metrics in parentheses represent relative improvement over base FinBERT performance.
π§ Text-Level Comparison: FinBERT vs FinBERT-Finetuned (Ours)
π΄ FinBERT Failed Texts (as per discussed in its Paper
) (Correctly Predicted by Ours)
Text | Expected | FinBERT | Ours |
---|---|---|---|
Pre-tax loss totaled euro 0.3 million, compared to a loss of euro 2.2 million in the first quarter of 2005. | Positive | β Negative (0.7223) | β Positive (0.9997) |
This implementation is very important to the operator, since it is about to launch its Fixed to Mobile convergence service | Neutral | β Positive (0.7204) | β Neutral (0.9998) |
The situation of coated magazine printing paper will continue to be weak. | Negative | β Negative (0.8811) | β Negative (0.9996) |
π‘ FinBERT Incorrect, Ours Corrected It
Text | Expected | FinBERT | Ours |
---|---|---|---|
The debt-to-equity ratio was 1.15, flat quarter-over-quarter. | Neutral | β Negative (0.6239) | β Neutral (0.9998) |
Earnings smashed expectations $AAPL posts $0.89 EPS vs $0.78 est. Bullish momentum incoming! | Positive | β Neutral (0.4237) | β Positive (0.9998) |
$TSLA growth is slowing β but hey, at least Elon tweeted something funny today. #Tesla #markets | Negative | β Neutral (0.5884) | β Negative (0.7084) |
βͺ Out-of-Context Texts (FinBERT Misclassified, Ours Handled Properly)
Text | Expected | FinBERT | Ours |
---|---|---|---|
Unexpected Snowstorm Hits Sahara Desert, Blanketing Sand Dunes | Neutral | β Negative (0.8675) | β Neutral (0.9993) |
Virtual Reality Therapy Shows Promise for Treating PTSD | Neutral | β Positive (0.8522) | β Neutral (0.9997) |
Note: These examples demonstrate improvements in real-world understanding, context handling, and sentiment differentiation with our FinBERT-finetuned model. Values in parentheses (e.g.,
0.9485
) indicate the modelβs confidence score for its predicted sentiment.
β οΈ Limitations & Failure Cases
While the model outperformed the base FinBERT across benchmarks, some failure cases were observed in statements involving fine-grained numerical reasoning, particularly when numerical comparison semantics are complex or subtle.
Text | Expected | FinBERT | Ours |
---|---|---|---|
Net profit to euro 203β―million from euro 172β―million in the previous year. | Positive | β Positive (0.9485) | β Positive (0.9995) |
Net profit to euro 103β―million from euro 172β―million in the previous year. | Negative | β Positive (0.9486) | β Positive (0.9994) |
Pre-tax loss totaled euro 0.3 million, compared to a loss of euro 2.2 million in Q1 2005. | Positive | β Negative (0.7223) | β Positive (0.9997) |
Pre-tax loss totaled euro 5.3 million, compared to a loss of euro 2.2 million in Q1 2005. | Negative | β Negative (0.7205) | β Positive (0.9997) |
Net profit totaled euro 5.3 million, compared to euro 2.2 million in the previous quarter of 2005. | Positive | β Negative (0.6347) | β Negative (0.9996) |
Net profit totaled euro 0.3 million, compared to euro 2.2 million in the previous quarter of 2005. | Negative | β Negative (0.6320) | β Negative (0.9996) |
Note: Values in parentheses (e.g.,
0.9485
) indicate the modelβs confidence score for its predicted sentiment.
This suggests that explicit numerical comparison reasoning still remains challenging without targeted pretraining or numerical reasoning augmentation.
Hyperparameters
During fine-tuning, the following hyperparameters were used to optimize model performance:
- Learning Rate: 2e-5
- Batch Size: 32
- Number of Epochs: 3
- Max Sequence Length: 128 tokens
- Optimizer: AdamW
- Weight Decay: 0.01
- Evaluation Strategy: Evaluation performed after each epoch
Note: These settings were chosen to balance training efficiency and accuracy for financial news sentiment classification.
π‘ Summary
β
Better generalization than FinBERT on both benchmark and noisy real-world samples
β
Strong accuracy and F1 scores
β οΈ Room to improve on numerical reasoning comparisons β potential for integration with numerical-aware transformers or contrastive fine-tuning
Usage
Pipeline Approach
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
import torch
model_name = "project-aps/finbert-finetune"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
# Override the config's id2label and label2id
label_map = {0: "neutral", 1: "negative", 2: "positive"}
model.config.id2label = label_map
model.config.label2id = {v: k for k, v in label_map.items()}
pipe = pipeline("text-classification", model=model, tokenizer=tokenizer)
text = "Earnings smashed expectations AAPL posts $0.89 EPS vs $0.78 est. Bullish momentum incoming! #EarningsSeason"
print(pipe(text)) #Output: [{'label': 'positive', 'score': 0.9997484087944031}]
Simple Approach
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch
model_name = "project-aps/finbert-finetune"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
text = "Earnings smashed expectations AAPL posts $0.89 EPS vs $0.78 est. Bullish momentum incoming! #EarningsSeason"
inputs = tokenizer(text, return_tensors="pt", truncation=True)
outputs = model(**inputs)
predicted_class = torch.argmax(outputs.logits, dim=1).item()
label_map = {0: "neutral", 1: "negative", 2: "positive"}
print(f"Text : {text}")
print(f"Sentiment: {label_map[predicted_class]}")
Acknowledgements
We gratefully acknowledge the creators and maintainers of the resources used in this project:
ProsusAI/FinBERT β A pre-trained BERT model specifically designed for financial sentiment analysis, which served as the foundation for our fine-tuning efforts.
FinGPT Sentiment Train Dataset β The dataset used for fine-tuning, containing a large collection of finance-related news headlines and sentiment annotations.
Financial PhraseBank Dataset β A widely used benchmark dataset for financial sentiment classification, including the All Agree and All Combined subsets.
FiQA + PhraseBank Kaggle Merged Dataset β A merged dataset combining FiQA and Financial PhraseBank entries, used for broader benchmarking of sentiment performance.
We thank these contributors for making their models and datasets publicly available, enabling high-quality research and development in financial NLP.
- Downloads last month
- 148
Model tree for project-aps/finbert-finetune
Base model
ProsusAI/finbert