✨ Ettin 400M for NER

This repository hosts an Ettin 400M model that was fine-tuned on the CoNLL-2003 NER dataset with the awesome Flair libary.

Please notice the following caveats:

⚠️ To workaround a tokenizer problem in ModernBERT/Ettin, this model was fine-tuned on a forked and modified Ettin 400M model.
⚠️ At the moment, don't expect "uber" BERT-like performance, more experiments are needed. I am pretty sure that RoPE is causing this.

📝 Implementation

The model was trained using my ModernBERT experiments repo.

📊 Performance

A very basic hyper-parameter search is performanced for five different seeds, with reported averaged micro F1-Score on the development set of CoNLL-2003:

Configuration	Run 1	Run 2	Run 3	Run 4	Run 5	Avg.
`bs16-e10-cs0-lr4e-05`	96	96.17	96.31	96.19	96.2	96.17 ± 0.1
`bs16-e10-cs0-lr3e-05`	96.25	96.23	96.12	96.3	95.81	96.14 ± 0.18
`bs16-e10-cs0-lr2e-05`	96.09	96.24	95.88	96.1	96.12	96.09 ± 0.12
`bs16-e10-cs0-lr5e-05`	95.98	95.93	96.11	96.1	96	96.02 ± 0.07
`bs16-e10-cs0-lr1e-05`	95.77	95.8	96.14	96.01	95.84	95.91 ± 0.14

The performance of the current uploaded model is marked in bold.

📣 Usage

The following code can be used to test the model and recognize named entities for a given sentence:

from flair.data import Sentence
from flair.models import SequenceTagger

# Load the model
tagger = SequenceTagger.load("stefan-it/flair-ettin-400m-ner-conll03")

# Define an example sentence
sentence = Sentence("George Washington went to Washington very fast.")

# Now let's predict named entities...
tagger.predict(sentence)

# Print-out the recognized named entities
print("The following named entities are found:")
for entity in sentence.get_spans('ner'):
    print(entity)

This outputs:

Span[0:2]: "George Washington" → PER (1.0000)
Span[4:5]: "Washington" → LOC (1.0000)

stefan-it
/

flair-ettin-400m-ner-conll03

✨ Ettin 400M for NER

📝 Implementation

📊 Performance

📣 Usage

Model tree for stefan-it/flair-ettin-400m-ner-conll03

Dataset used to train stefan-it/flair-ettin-400m-ner-conll03