camembert-base-literary-NER-v2

camembert-base-literary-NER-v2 is a french NER model trained on a dataset of 7 french novels by Maurel et al. (2025). Annotations guidelines are based on UniversalNER (Mayhew et al., 2024). The model supports PER, LOC, ORG and MISC entities. It was trained for 3 epochs with a learning rate of 1e-5.

Performance

We performed a 7-folds evaluation of the model (on the 7 novels from the dataset), and obtained the following results:

Novel	Micro F1
Les Trois Mousquetaires	71.15
Le Rouge et le Noir	88.97
Eugénie Grandet	88.56
Germinal	89.94
Bel-Ami	87.13
Notre-Dame de Paris	75.70
Madame Bovary	88.25
-------------------------	----------
Global Micro F1	81.21

Class	Micro F1	Precision	Recall
PERS	83.51	81.89	85.20
LOC	81.80	78.69	85.17
ORG	55.74	42.39	81.34
OTHER	34.08	25.15	52.86

Usage in Renard

The default model for french NER in Renard is currently camembert-base-literary-NER, but it is possible to use this model instead. For example, in this pipeline:

from renard.pipeline import Pipeline
from renard.pipeline.tokenization import NLTKTokenizer
from renard.pipeline.ner import BertNamedEntityRecognizer
from renard.pipeline.character_unification import GraphRulesCharacterUnifier
from renard.pipeline.graph_extraction import CoOccurrencesGraphExtractor

pipeline = Pipeline(
    [
        NLTKTokenizer(),
        BertNamedEntityRecognizer(
            model="compnet-renard/camembert-base-literary-NER-v2"
        ),
        # Note the `ignore_leading_determiner=True` argument: since
        # this model extracts entities with their leading determiners
        # (e.g. "l'archidiacre Claude Frollo"), we signal the
        # character unifier module to not take these into account in
        # its unification rules.
        GraphRulesCharacterUnifier(ignore_leading_determiner=True),
        CoOccurrencesGraphExtractor(co_occurrences_dist=10),
    ]
)

out = pipeline("Quasimodo affronte l'archidiacre Claude Frollo dans Notre-Dame.")
print(out.entities)
print(out.characters)

This outputs:

[NEREntity(tokens=['Quasimodo'], start_idx=0, end_idx=1, tag='PER'), NEREntity(tokens=["l'archidiacre", 'Claude', 'Frollo'], start_idx=2, end_idx=5, tag='PER'), NEREntity(tokens=['Notre-Dame'], start_idx=6, end_idx=7, tag='LOC')]
[<Quasimodo, Gender.UNKNOWN, 1 mentions>, <l'archidiacre Claude Frollo, Gender.UNKNOWN, 1 mentions>]

Citation

If you use this model in your research, please cite:

@InProceedings{
  authors = {Maurel, P. and Amalvy, A. and Labatut, V. and Alrahabi, M.},
  title = {Du repérage à l’analyse : un modèle pour la reconnaissance d’entités nommées dans les textes littéraires en français},
  booktitle = {Digital Humanities 2025 (to appear)},
  year = {2025},
}

compnet-renard
/

camembert-base-literary-NER-v2

camembert-base-literary-NER-v2

Performance

Usage in Renard

Citation

Model tree for compnet-renard/camembert-base-literary-NER-v2