Model Description

This model's training approach is inspired by paper of Thomas et al.(2024) and the pykale/bart-large-ocr model. It has been trained on French-language corpora. For more training details see dataset HenriPorteur/ocr-error-senat.

Usage

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

model = AutoModelForSeq2SeqLM.from_pretrained('HenriPorteur/bart-large-ocr-fr')
tokenizer = AutoTokenizer.from_pretrained('HenriPorteur/bart-large-ocr-fr')
generator = pipeline('text2text-generation', model=model.to('cuda'), tokenizer=tokenizer, device='cuda', max_length=1024)

ocr = "C3nUm3r~o compr3nd3g@lement l compte-rendu deIa séance du mème jour de l@ CHAMBRE des dépuTés¡." 
pred = generator(ocr)[0]['generated_text']
print(pred)
Downloads last month
2
Safetensors
Model size
406M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train HenriPorteur/bart-large-ocr-fr