roberta_nli_ensemble
A fine-tuned RoBERTa model designed for an Natural Language Inference (NLI) task, classifying the relationship between pairs of sentences given a premise and a hypothesis.
Model Details
Model Description
This model builds upon the roberta-base architecture, adding a multi-layer classification head for NLI. It computes average pooled representations of premise and hypothesis tokens (identified via token_type_ids) and concatenates them before passing through additional linear and non-linear layers. The final output is used to classify the pair of sentences into one of three classes.
- Developed by: Dev Soneji and Patrick Mermelstein Lyons
- Language(s): English
- Model type: Supervised
- Model architecture: RoBERTa encoder with a multi-layer classification head
- Finetuned from model: roberta-base
Model Resources
- Repository: Devtrick/roberta_nli_ensemble
- Paper or documentation: RoBERTa: A Robustly Optimized BERT Pretraining Approach
Training Details
Training Data
The model was trained on a dataset located in train.csv. This dataset comprised of 24K premise-hypothesis pairs, with a label to determine if the hypothesis is true based on the premise. The label was binary, 0 = hypothesis is false, 1 = hypothesis is true. No further details were given on the origin and validity of this dataset.
The data was passed through a tokenizer (AutoTokenizer), as part of the standard hugging face library. No other pre-processing was done, aside from relabelling columns to match the expected format.
Training Procedure
The model was trained in the following way:
- The model was trained on the following data (Training Data), with renaming of columns and tokenization.
- The model was initialised with a custom configuration class,
roBERTaConfig, setting essential parameters. The model itself,roBERTaClassifierextends the pretrained RoBERTa model to include multiple linear layers for classification and pooling. - Hyperparameter selection was carried out in a seperate grid search to identify the best performing hyperparameters. This resulted in the following parameters - Training Hyperparameters.
- The model was validated with the following test data, giving the following results.
- Checkpoints were saved after each epoch, and finally the best checkpoint was reloaded and pushed to the Hugging Face Hub.
Training Hyperparameters
The following hyperparameters were used during training:
- learning_rate: 3e-05
- train_batch_size: 128
- eval_batch_size: 128
- weight_decay: 0.01
- seed: 42
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- num_epochs: 10
Speeds, Sizes, Times
- Training time: This model took 12 minutes 17 seconds to train on the hardware specified below. It was trained on 10 epochs, however early stopping caused only 5 epochs to train.
Model size: 126M parameteres.
Evaluation
Testing Data & Metrics
Testing Data
The development (and effectively testing) dataset is located in dev.csv. This is 6K pairs as validation data, in the same format of the training data. No further details were given on the origin and validity of this dataset.
The data was passed through a tokenizer (AutoTokenizer), as part of the standard hugging face library. No other pre-processing was done, aside from relabelling columns to match the expected format.
Metrics
- Accuracy: Proportion of correct predictions.
- Matthews Correlation Coefficient (MCC): Correlation coefficient between predicted and true labels, ranging from -1 to 1.
Results
Final results on the evaluation set:
- Loss: 0.4849
- Accuracy: 0.8848
- Mcc: 0.7695
| Training Loss | Epoch | Step | Validation Loss | Accuracy | Mcc |
|---|---|---|---|---|---|
| 0.6552 | 1.0 | 191 | 0.3383 | 0.8685 | 0.7377 |
| 0.2894 | 2.0 | 382 | 0.3045 | 0.8778 | 0.7559 |
| 0.1891 | 3.0 | 573 | 0.3255 | 0.8854 | 0.7705 |
| 0.1209 | 4.0 | 764 | 0.3963 | 0.8829 | 0.7657 |
| 0.0843 | 5.0 | 955 | 0.4849 | 0.8848 | 0.7695 |
Technical Specifications
Hardware
PC specs the model was trained on:
- CPU: AMD Ryzen 7 7700X
- GPU: NVIDIA GeForce RTX 5070 Ti
- Memory: 32GB DDR5
- Motherboard: MSI MAG B650 TOMAHAWK WIFI Motherboard
Software
- Transformers 4.50.2
- Pytorch 2.8.0.dev20250326+cu128
- Datasets 3.5.0
- Tokenizers 0.21.1
Bias, Risks, and Limitations
- The model's performance and biases depend on the data on which it was trained, however no details of the data's origin is known so this cannot be commented on.
- The risk lies in trusting any labelling with confidence, without manual verification. Models can make mistakes, verify the outputs.
- This is limited by the training data not being comprehensive of all possible premise-hypothesis combinations, however this is possible in real life. Additional training and validation data would have been useful.
Additional Information
- This model was pushed to the Hugging Face Hub with
trainer.push_to_hub()after training locally.
- Downloads last month
- -