File size: 2,856 Bytes
f47e05d
19ab8e1
f47e05d
19ab8e1
 
 
f47e05d
19ab8e1
 
f47e05d
 
19ab8e1
f47e05d
19ab8e1
f47e05d
19ab8e1
f47e05d
19ab8e1
 
f47e05d
19ab8e1
f47e05d
19ab8e1
f47e05d
19ab8e1
f47e05d
19ab8e1
f47e05d
 
19ab8e1
 
 
f47e05d
 
 
 
 
 
 
 
 
19ab8e1
f47e05d
 
 
 
 
 
 
 
 
 
 
19ab8e1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f47e05d
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
---
library_name: transformers
base_model: dbmdz/bert-base-german-uncased
license: mit
language:
- de
model-index:
  - name: LernnaviBERT
    results: []
---

# LernnaviBERT Model Card

LernnaviBERT is finetuning of [German BERT](https://huggingface.co/dbmdz/bert-base-german-uncased) on educational textual data from the Lernnavi Intelligent Tutoring Systems (ITS). It is trained on masked language modeling following the BERT training scheme.

### Model Sources

- **Repository:** [https://github.com/epfl-ml4ed/answer-forecasting](https://github.com/epfl-ml4ed/answer-forecasting)
- **Paper:** [https://arxiv.org/abs/2405.20079](https://arxiv.org/abs/2405.20079)

### Direct Use

Being a fine-tuning of a base BERT model, LernnaviBERT is suitable for all BERT uses, especially in the educational domain in the German language.

### Downstream Use

LernnaviBERT has been fine-tuned for [MCQ answering](https://huggingface.co/epfl-ml4ed/MCQBert) and Student Answer Forecasting (like [MCQStudentBertCat](https://huggingface.co/epfl-ml4ed/MCQStudentBertCat) and [MCQStudentBertSum](https://huggingface.co/epfl-ml4ed/MCQStudentBertSum)) as described in [https://arxiv.org/abs/2405.20079](https://arxiv.org/abs/2405.20079)


## Training Details

The model was trained on text data from a real-world ITS, Lernnavi, on ~40k text pieces for 3 epochs with a batch size of 16, going from an initial perplexity of 1.21 on Lernnavi data to a final perplexity of 1.01

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 2e-05
- train_batch_size: 16
- eval_batch_size: 16
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 3
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step | Validation Loss |
|:-------------:|:-----:|:----:|:---------------:|
| 0.0385        | 1.0   | 2405 | 0.0137          |
| 0.0142        | 2.0   | 4810 | 0.0084          |
| 0.0096        | 3.0   | 7215 | 0.0072          |


## Citation

If you find this useful in your work, please cite our paper

```
@misc{gado2024student,
      title={Student Answer Forecasting: Transformer-Driven Answer Choice Prediction for Language Learning}, 
      author={Elena Grazia Gado and Tommaso Martorella and Luca Zunino and Paola Mejia-Domenzain and Vinitra Swamy and Jibril Frej and Tanja Käser},
      year={2024},
      eprint={2405.20079},
      archivePrefix={arXiv},
}
```

```
Gado, E., Martorella, T., Zunino, L., Mejia-Domenzain, P., Swamy, V., Frej, J., Käser, T. (2024). 
Student Answer Forecasting: Transformer-Driven Answer Choice Prediction for Language Learning. 
In: Proceedings of the Conference on Educational Data Mining (EDM 2024). 
```

### Framework versions

- Transformers 4.37.1
- Pytorch 2.2.0
- Datasets 2.2.1
- Tokenizers 0.15.1