Model Description
This model's training approach is inspired by paper of Thomas et al.(2024) and the pykale/bart-large-ocr model. It has been trained on French-language corpora. For more training details see dataset HenriPorteur/ocr-error-senat.
Usage
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline
model = AutoModelForSeq2SeqLM.from_pretrained('HenriPorteur/bart-large-ocr-fr')
tokenizer = AutoTokenizer.from_pretrained('HenriPorteur/bart-large-ocr-fr')
generator = pipeline('text2text-generation', model=model.to('cuda'), tokenizer=tokenizer, device='cuda', max_length=1024)
ocr = "C3nUm3r~o compr3nd3g@lement l compte-rendu deIa séance du mème jour de l@ CHAMBRE des dépuTés¡."
pred = generator(ocr)[0]['generated_text']
print(pred)
- Downloads last month
- 2
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support