camembert-base-literary-NER-v2

camembert-base-literary-NER-v2 is a french NER model trained on a dataset of 7 french novels by Maurel et al. (2025). Annotations guidelines are based on UniversalNER (Mayhew et al., 2024). The model supports PER, LOC, ORG and MISC entities. It was trained for 3 epochs with a learning rate of 1e-5.

Performance

We performed a 7-folds evaluation of the model (on the 7 novels from the dataset), and obtained the following results:

Novel Micro F1
Les Trois Mousquetaires 71.15
Le Rouge et le Noir 88.97
Eugénie Grandet 88.56
Germinal 89.94
Bel-Ami 87.13
Notre-Dame de Paris 75.70
Madame Bovary 88.25
------------------------- ----------
Global Micro F1 81.21
Class Micro F1 Precision Recall
PERS 83.51 81.89 85.20
LOC 81.80 78.69 85.17
ORG 55.74 42.39 81.34
OTHER 34.08 25.15 52.86

Usage in Renard

The default model for french NER in Renard is currently camembert-base-literary-NER, but it is possible to use this model instead. For example, in this pipeline:

from renard.pipeline import Pipeline
from renard.pipeline.tokenization import NLTKTokenizer
from renard.pipeline.ner import BertNamedEntityRecognizer
from renard.pipeline.character_unification import GraphRulesCharacterUnifier
from renard.pipeline.graph_extraction import CoOccurrencesGraphExtractor

pipeline = Pipeline(
    [
        NLTKTokenizer(),
        BertNamedEntityRecognizer(
            model="compnet-renard/camembert-base-literary-NER-v2"
        ),
        # Note the `ignore_leading_determiner=True` argument: since
        # this model extracts entities with their leading determiners
        # (e.g. "l'archidiacre Claude Frollo"), we signal the
        # character unifier module to not take these into account in
        # its unification rules.
        GraphRulesCharacterUnifier(ignore_leading_determiner=True),
        CoOccurrencesGraphExtractor(co_occurrences_dist=10),
    ]
)

out = pipeline("Quasimodo affronte l'archidiacre Claude Frollo dans Notre-Dame.")
print(out.entities)
print(out.characters)

This outputs:

[NEREntity(tokens=['Quasimodo'], start_idx=0, end_idx=1, tag='PER'), NEREntity(tokens=["l'archidiacre", 'Claude', 'Frollo'], start_idx=2, end_idx=5, tag='PER'), NEREntity(tokens=['Notre-Dame'], start_idx=6, end_idx=7, tag='LOC')]
[<Quasimodo, Gender.UNKNOWN, 1 mentions>, <l'archidiacre Claude Frollo, Gender.UNKNOWN, 1 mentions>]

Citation

If you use this model in your research, please cite:

@InProceedings{
  authors = {Maurel, P. and Amalvy, A. and Labatut, V. and Alrahabi, M.},
  title = {Du repérage à l’analyse : un modèle pour la reconnaissance d’entités nommées dans les textes littéraires en français},
  booktitle = {Digital Humanities 2025 (to appear)},
  year = {2025},
}
Downloads last month
31
Safetensors
Model size
110M params
Tensor type
F32
·
Inference Providers NEW

Model tree for compnet-renard/camembert-base-literary-NER-v2

Finetuned
(114)
this model