You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

This model was created as a research project to study the impact of high-quality data from the CoRoLa corpus on a small model. The model is intended only for research and may not be suitable for production use.

Log in or Sign Up to review the conditions and access this model content.

This model is the result of continous pre-training of the Llama-3.2-1B model on selected data from the Representative Corpus of Contemporary Romanian Language (CoRoLa). The purpose of the experiments was to evaluate the impact of a small high quality corpus of Romanian language. Thus, we focused only on a small part of the CoRoLa corpus. We filtered the documents based on CoRoLa metadata attributes DocumentType (Book, inBook, inCollection) and DocumentTextDomain (Science). This resulted in 7,568 documents that were included in this research.

A paper detailing the results was submitted to the 20th International Conference on Linguistic Resources and Tools for Natural Language Processing (CONSILR 2025).

Downloads last month
9
Safetensors
Model size
1.24B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for racai/corola-llama-3.2-1b-e3

Finetuned
(694)
this model
Finetunes
1 model