camembert-base-literary-NER-v2
camembert-base-literary-NER-v2
is a french NER model trained on a dataset of 7 french novels by Maurel et al. (2025). Annotations guidelines are based on UniversalNER (Mayhew et al., 2024). The model supports PER
, LOC
, ORG
and MISC
entities. It was trained for 3 epochs with a learning rate of 1e-5.
Performance
We performed a 7-folds evaluation of the model (on the 7 novels from the dataset), and obtained the following results:
Novel | Micro F1 |
---|---|
Les Trois Mousquetaires | 71.15 |
Le Rouge et le Noir | 88.97 |
Eugénie Grandet | 88.56 |
Germinal | 89.94 |
Bel-Ami | 87.13 |
Notre-Dame de Paris | 75.70 |
Madame Bovary | 88.25 |
------------------------- | ---------- |
Global Micro F1 | 81.21 |
Class | Micro F1 | Precision | Recall |
---|---|---|---|
PERS | 83.51 | 81.89 | 85.20 |
LOC | 81.80 | 78.69 | 85.17 |
ORG | 55.74 | 42.39 | 81.34 |
OTHER | 34.08 | 25.15 | 52.86 |
Usage in Renard
The default model for french NER in Renard is currently camembert-base-literary-NER, but it is possible to use this model instead. For example, in this pipeline:
from renard.pipeline import Pipeline
from renard.pipeline.tokenization import NLTKTokenizer
from renard.pipeline.ner import BertNamedEntityRecognizer
from renard.pipeline.character_unification import GraphRulesCharacterUnifier
from renard.pipeline.graph_extraction import CoOccurrencesGraphExtractor
pipeline = Pipeline(
[
NLTKTokenizer(),
BertNamedEntityRecognizer(
model="compnet-renard/camembert-base-literary-NER-v2"
),
# Note the `ignore_leading_determiner=True` argument: since
# this model extracts entities with their leading determiners
# (e.g. "l'archidiacre Claude Frollo"), we signal the
# character unifier module to not take these into account in
# its unification rules.
GraphRulesCharacterUnifier(ignore_leading_determiner=True),
CoOccurrencesGraphExtractor(co_occurrences_dist=10),
]
)
out = pipeline("Quasimodo affronte l'archidiacre Claude Frollo dans Notre-Dame.")
print(out.entities)
print(out.characters)
This outputs:
[NEREntity(tokens=['Quasimodo'], start_idx=0, end_idx=1, tag='PER'), NEREntity(tokens=["l'archidiacre", 'Claude', 'Frollo'], start_idx=2, end_idx=5, tag='PER'), NEREntity(tokens=['Notre-Dame'], start_idx=6, end_idx=7, tag='LOC')]
[<Quasimodo, Gender.UNKNOWN, 1 mentions>, <l'archidiacre Claude Frollo, Gender.UNKNOWN, 1 mentions>]
Citation
If you use this model in your research, please cite:
@InProceedings{
authors = {Maurel, P. and Amalvy, A. and Labatut, V. and Alrahabi, M.},
title = {Du repérage à l’analyse : un modèle pour la reconnaissance d’entités nommées dans les textes littéraires en français},
booktitle = {Digital Humanities 2025 (to appear)},
year = {2025},
}
- Downloads last month
- 31
Model tree for compnet-renard/camembert-base-literary-NER-v2
Base model
almanach/camembert-base