We collects financial domain terms from Investopedia's Financia terms dictionary, NYSSCPA's accounting terminology guide and Harvey's Hypertextual Finance Glossary to expand RoBERTa's vocab dict.
Based on added-financial-terms RoBERTa, we pretrained our model on multilple financial corpus:
- Financial Terms
- Financial Datasets
- Earnings Call 2016-2023 NASDAQ 100 components stocks's Earnings Call Transcripts.
In continual pretraining step, we apply following experiments settings to achieve better finetuned results on Four Financial Datasets:
- Masking Probability: 0.4 (instead of default 0.15)
- Warmup Steps: 0 (deriving better results than models with warmup steps)
- Epochs: 1 (is enough in case of overfitting)
- weight_decay: 0.01
- Train Batch Size: 64
- FP16
- Downloads last month
- 377
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.