This is a roberta model trained on kubhist2 (https://spraakbanken.gu.se/en/resources/kubhist2, https://spraakbanken.gu.se/blogg/index.php/2019/09/15/the-kubhist-corpus-of-swedish-newspapers/). For a HF version of kubhist2, see here: https://huggingface.co/datasets/ChangeIsKey/kubhist2

This is a work in progress, the quality of the model -- just like the quality of the training data -- is far from great.

Shared here with no guarantee whatsoever, will likely change, use at your own risk, etc.

Discussion of Biases

This is trained on historical data. As such, outdated views might be present in the data.

Other Known Limitations

The data comes from an OCR process. The text is thus not perfect, especially so in the earlier decades.

Contact

Simon Hengchen, iguanodon.ai

Downloads last month
28
Safetensors
Model size
78.1M params
Tensor type
I64
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train ChangeIsKey/roberta-kubhist2