|
--- |
|
library_name: transformers |
|
base_model: dbmdz/bert-base-german-uncased |
|
license: mit |
|
language: |
|
- de |
|
model-index: |
|
- name: LernnaviBERT |
|
results: [] |
|
--- |
|
|
|
# LernnaviBERT Model Card |
|
|
|
LernnaviBERT is finetuning of [German BERT](https://huggingface.co/dbmdz/bert-base-german-uncased) on educational textual data from the Lernnavi Intelligent Tutoring Systems (ITS). It is trained on masked language modeling following the BERT training scheme. |
|
|
|
### Model Sources |
|
|
|
- **Repository:** [https://github.com/epfl-ml4ed/answer-forecasting](https://github.com/epfl-ml4ed/answer-forecasting) |
|
- **Paper:** [https://arxiv.org/abs/2405.20079](https://arxiv.org/abs/2405.20079) |
|
|
|
### Direct Use |
|
|
|
Being a fine-tuning of a base BERT model, LernnaviBERT is suitable for all BERT uses, especially in the educational domain in the German language. |
|
|
|
### Downstream Use |
|
|
|
LernnaviBERT has been fine-tuned for [MCQ answering](https://huggingface.co/epfl-ml4ed/MCQBert) and Student Answer Forecasting (like [MCQStudentBertCat](https://huggingface.co/epfl-ml4ed/MCQStudentBertCat) and [MCQStudentBertSum](https://huggingface.co/epfl-ml4ed/MCQStudentBertSum)) as described in [https://arxiv.org/abs/2405.20079](https://arxiv.org/abs/2405.20079) |
|
|
|
|
|
## Training Details |
|
|
|
The model was trained on text data from a real-world ITS, Lernnavi, on ~40k text pieces for 3 epochs with a batch size of 16, going from an initial perplexity of 1.21 on Lernnavi data to a final perplexity of 1.01 |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 2e-05 |
|
- train_batch_size: 16 |
|
- eval_batch_size: 16 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: linear |
|
- num_epochs: 3 |
|
- mixed_precision_training: Native AMP |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | |
|
|:-------------:|:-----:|:----:|:---------------:| |
|
| 0.0385 | 1.0 | 2405 | 0.0137 | |
|
| 0.0142 | 2.0 | 4810 | 0.0084 | |
|
| 0.0096 | 3.0 | 7215 | 0.0072 | |
|
|
|
|
|
## Citation |
|
|
|
If you find this useful in your work, please cite our paper |
|
|
|
``` |
|
@misc{gado2024student, |
|
title={Student Answer Forecasting: Transformer-Driven Answer Choice Prediction for Language Learning}, |
|
author={Elena Grazia Gado and Tommaso Martorella and Luca Zunino and Paola Mejia-Domenzain and Vinitra Swamy and Jibril Frej and Tanja Käser}, |
|
year={2024}, |
|
eprint={2405.20079}, |
|
archivePrefix={arXiv}, |
|
} |
|
``` |
|
|
|
``` |
|
Gado, E., Martorella, T., Zunino, L., Mejia-Domenzain, P., Swamy, V., Frej, J., Käser, T. (2024). |
|
Student Answer Forecasting: Transformer-Driven Answer Choice Prediction for Language Learning. |
|
In: Proceedings of the Conference on Educational Data Mining (EDM 2024). |
|
``` |
|
|
|
### Framework versions |
|
|
|
- Transformers 4.37.1 |
|
- Pytorch 2.2.0 |
|
- Datasets 2.2.1 |
|
- Tokenizers 0.15.1 |
|
|