File size: 517 Bytes
1ded9e1
 
 
 
 
 
 
 
 
 
 
 
 
 
1cb557b
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
---
license: gpl-3.0
datasets:
- oscar-corpus/OSCAR-2301
language:
- nl
base_model:
- DTAI-KULeuven/robbert-2023-dutch-base
pipeline_tag: text-classification
tags:
- medical
---


We used GPT4.1-nano to classify generic texts from OSCAR as non-medical/medical using [PubScience](https://github.com/bramiozo/PubScience/tree/main/pubscience/label). We labeled 400.000 texts, with about 40.000 labeled as positive. 
We then trained a SequenceClassifier on 80.000 samples with a 50/50 class ratio.

This can be used e.g.