epfl-ml4ed
/

LernnaviBERT

Model card Files Files and versions

LernnaviBERT / README.md

tommymarto's picture

Update README.md

19ab8e1 verified over 1 year ago

|

history blame contribute delete

2.86 kB

	---
	library_name: transformers
	base_model: dbmdz/bert-base-german-uncased
	license: mit
	language:
	- de
	model-index:
	- name: LernnaviBERT
	results: []
	---

	# LernnaviBERT Model Card

	LernnaviBERT is finetuning of [German BERT](https://huggingface.co/dbmdz/bert-base-german-uncased) on educational textual data from the Lernnavi Intelligent Tutoring Systems (ITS). It is trained on masked language modeling following the BERT training scheme.

	### Model Sources

	- Repository: [https://github.com/epfl-ml4ed/answer-forecasting](https://github.com/epfl-ml4ed/answer-forecasting)
	- Paper: [https://arxiv.org/abs/2405.20079](https://arxiv.org/abs/2405.20079)

	### Direct Use

	Being a fine-tuning of a base BERT model, LernnaviBERT is suitable for all BERT uses, especially in the educational domain in the German language.

	### Downstream Use

	LernnaviBERT has been fine-tuned for [MCQ answering](https://huggingface.co/epfl-ml4ed/MCQBert) and Student Answer Forecasting (like [MCQStudentBertCat](https://huggingface.co/epfl-ml4ed/MCQStudentBertCat) and [MCQStudentBertSum](https://huggingface.co/epfl-ml4ed/MCQStudentBertSum)) as described in [https://arxiv.org/abs/2405.20079](https://arxiv.org/abs/2405.20079)


	## Training Details

	The model was trained on text data from a real-world ITS, Lernnavi, on ~40k text pieces for 3 epochs with a batch size of 16, going from an initial perplexity of 1.21 on Lernnavi data to a final perplexity of 1.01

	### Training hyperparameters

	The following hyperparameters were used during training:
	- learning_rate: 2e-05
	- train_batch_size: 16
	- eval_batch_size: 16
	- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
	- lr_scheduler_type: linear
	- num_epochs: 3
	- mixed_precision_training: Native AMP

	### Training results

	\| Training Loss \| Epoch \| Step \| Validation Loss \|
	\|:-------------:\|:-----:\|:----:\|:---------------:\|
	\| 0.0385 \| 1.0 \| 2405 \| 0.0137 \|
	\| 0.0142 \| 2.0 \| 4810 \| 0.0084 \|
	\| 0.0096 \| 3.0 \| 7215 \| 0.0072 \|


	## Citation

	If you find this useful in your work, please cite our paper

	```
	@misc{gado2024student,
	title={Student Answer Forecasting: Transformer-Driven Answer Choice Prediction for Language Learning},
	author={Elena Grazia Gado and Tommaso Martorella and Luca Zunino and Paola Mejia-Domenzain and Vinitra Swamy and Jibril Frej and Tanja Käser},
	year={2024},
	eprint={2405.20079},
	archivePrefix={arXiv},
	}
	```

	```
	Gado, E., Martorella, T., Zunino, L., Mejia-Domenzain, P., Swamy, V., Frej, J., Käser, T. (2024).
	Student Answer Forecasting: Transformer-Driven Answer Choice Prediction for Language Learning.
	In: Proceedings of the Conference on Educational Data Mining (EDM 2024).
	```

	### Framework versions

	- Transformers 4.37.1
	- Pytorch 2.2.0
	- Datasets 2.2.1
	- Tokenizers 0.15.1