LernnaviBERT / README.md
tommymarto's picture
Update README.md
19ab8e1 verified
---
library_name: transformers
base_model: dbmdz/bert-base-german-uncased
license: mit
language:
- de
model-index:
- name: LernnaviBERT
results: []
---
# LernnaviBERT Model Card
LernnaviBERT is finetuning of [German BERT](https://huggingface.co/dbmdz/bert-base-german-uncased) on educational textual data from the Lernnavi Intelligent Tutoring Systems (ITS). It is trained on masked language modeling following the BERT training scheme.
### Model Sources
- **Repository:** [https://github.com/epfl-ml4ed/answer-forecasting](https://github.com/epfl-ml4ed/answer-forecasting)
- **Paper:** [https://arxiv.org/abs/2405.20079](https://arxiv.org/abs/2405.20079)
### Direct Use
Being a fine-tuning of a base BERT model, LernnaviBERT is suitable for all BERT uses, especially in the educational domain in the German language.
### Downstream Use
LernnaviBERT has been fine-tuned for [MCQ answering](https://huggingface.co/epfl-ml4ed/MCQBert) and Student Answer Forecasting (like [MCQStudentBertCat](https://huggingface.co/epfl-ml4ed/MCQStudentBertCat) and [MCQStudentBertSum](https://huggingface.co/epfl-ml4ed/MCQStudentBertSum)) as described in [https://arxiv.org/abs/2405.20079](https://arxiv.org/abs/2405.20079)
## Training Details
The model was trained on text data from a real-world ITS, Lernnavi, on ~40k text pieces for 3 epochs with a batch size of 16, going from an initial perplexity of 1.21 on Lernnavi data to a final perplexity of 1.01
### Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 3
- mixed_precision_training: Native AMP
### Training results
| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:----:|:---------------:|
| 0.0385 | 1.0 | 2405 | 0.0137 |
| 0.0142 | 2.0 | 4810 | 0.0084 |
| 0.0096 | 3.0 | 7215 | 0.0072 |
## Citation
If you find this useful in your work, please cite our paper
```
@misc{gado2024student,
title={Student Answer Forecasting: Transformer-Driven Answer Choice Prediction for Language Learning},
author={Elena Grazia Gado and Tommaso Martorella and Luca Zunino and Paola Mejia-Domenzain and Vinitra Swamy and Jibril Frej and Tanja Käser},
year={2024},
eprint={2405.20079},
archivePrefix={arXiv},
}
```
```
Gado, E., Martorella, T., Zunino, L., Mejia-Domenzain, P., Swamy, V., Frej, J., Käser, T. (2024).
Student Answer Forecasting: Transformer-Driven Answer Choice Prediction for Language Learning.
In: Proceedings of the Conference on Educational Data Mining (EDM 2024).
```
### Framework versions
- Transformers 4.37.1
- Pytorch 2.2.0
- Datasets 2.2.1
- Tokenizers 0.15.1