Climate NER Model
This repository contains a fine-tuned Named Entity Recognition (NER) model specialized for climate change-related entities. The model was trained on the Climate Change NER dataset, which consists of 534 manually annotated abstracts from climate-related academic papers.
Model Description
This model is fine-tuned to recognize 13 climate-related entity types:
- climate-assets
- climate-datasets
- climate-greenhouse-gases
- climate-hazards
- climate-impacts
- climate-mitigations
- climate-models
- climate-nature
- climate-observations
- climate-organisms
- climate-organizations
- climate-problem-origins
- climate-properties
Training Data
The model was trained on the Climate Change NER dataset, which contains 534 abstracts sourced from the Semantic Scholar Academic Graph. The abstracts were manually annotated with climate-related entities using the IOB (Inside-Outside-Beginning) tagging scheme.
Dataset Statistics:
- Train set: 382 instances
- Validation set: 77 instances
- Test set: 75 instances
Model Performance
We evaluated three different models on the Climate Change NER test set:
Model | Precision | Recall | F1 Score |
---|---|---|---|
specter2_base | 0.57 | 0.61 | 0.57 |
modernBERT | 0.45 | 0.42 | 0.41 |
BERT-base | 0.53 | 0.57 | 0.52 |
We report micro-avg metrics, because entity classes are very unbalanced in test.
Usage
from ipymarkup import show_span_box_markup
from transformers import pipeline
ner = pipeline("ner", model="nicolauduran45/specter-climate-change-NER", tokenizer="nicolauduran45/specter-climate-change-NER", aggregation_strategy="simple", device=0)
text = 'multi-centennial variability of open ocean deep convection in the Atlantic sector of the Southern Ocean impacts the strength of the Atlantic Meridional Overturning Circulation (AMOC) in the Kiel Climate Model. The northward extent of Antarctic Bottom Water (AABW) strongly depends on the state of Weddell Sea deep convection.'
entities = ner(ex)
spans = [(s['start'], s['end'], s['entity_group'])for s in entities]
show_span_box_markup(text, spans)
To improve aggregation of split words, we recommend to use this function
def predict_with_proper_aggregation(text):
# Get the raw predictions
raw_entities = ner(text)
# Aggregate subword pieces into complete entities
aggregated_entities = []
current_entity = None
for entity in raw_entities:
# Check if this is a continuation token (starts with ##)
is_continuation = entity["word"].startswith("##")
if is_continuation and current_entity:
# Update the current entity by removing ## and appending
current_entity["word"] += entity["word"][2:]
current_entity["end"] = entity["end"]
# Update the score (average or keep the minimum)
current_entity["score"] = min(current_entity["score"], entity["score"])
# If entity types differ, use the one with higher confidence
if entity["entity_group"] != current_entity["entity_group"] and entity["score"] > current_entity["score"]:
current_entity["entity_group"] = entity["entity_group"]
current_entity["score"] = entity["score"]
else:
# If we have a previous entity, add it to results
if current_entity:
aggregated_entities.append(current_entity)
# Start a new entity
current_entity = entity.copy()
# Don't forget the last entity
if current_entity:
aggregated_entities.append(current_entity)
# Further aggregation: detect split entities that might not use ## notation
# but should be merged based on adjacent positions
i = 0
while i < len(aggregated_entities) - 1:
current = aggregated_entities[i]
next_entity = aggregated_entities[i + 1]
# Check if entities are adjacent and should be merged
if (current["end"] == next_entity["start"] and
current["entity_group"] == next_entity["entity_group"]):
# Merge entities
current["word"] += next_entity["word"]
current["end"] = next_entity["end"]
current["score"] = (current["score"] + next_entity["score"]) / 2
# Remove the next entity as it's now merged
aggregated_entities.pop(i + 1)
else:
i += 1
return aggregated_entities
entities = predict_with_proper_aggregation(ex)
spans = [(s['start'], s['end'], s['entity_group'])for s in entities]
show_span_box_markup(text, spans)
- Downloads last month
- 22
Model tree for nicolauduran45/specter-climate-change-NER
Base model
allenai/specter2_base