Recipe NER Model
This is a Named Entity Recognition (NER) model trained to extract entities from recipe text. The model can identify food ingredients, quantities, units, cooking processes, and physical qualities from recipe instructions.
Model Details
Model Description
- Model type: Token Classification (NER)
- Base model: ModernBERT-base
- Training data: TASTEset (recipe NER dataset)
- Number of labels: Unknown
Entity Types
The model recognizes the following entity types:
- FOOD: Food items and ingredients (e.g., "chicken breast", "olive oil")
- QUANTITY: Numerical quantities (e.g., "2", "1/2", "three")
- UNIT: Measurement units (e.g., "tablespoons", "cups", "pounds")
- PROCESS: Cooking processes and methods (e.g., "heat", "stir", "bake")
- PHYSICAL_QUALITY: Physical qualities and descriptors (e.g., "hot", "chopped", "fresh")
Label Format
The model uses BIO (Beginning-Inside-Outside) tagging scheme:
B-ENTITY
: Beginning of an entityI-ENTITY
: Inside/continuation of an entityO
: Outside any entity (not an entity)
Usage
Basic Usage with Transformers
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch
# Load model and tokenizer
model_name = "rgonzale/recipe-ner-model"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
# Example text
text = "Heat 2 tablespoons olive oil in a large pan"
# Tokenize
tokens = tokenizer(text, return_tensors="pt", truncation=True)
# Make prediction
with torch.no_grad():
outputs = model(**tokens)
# Get predictions
predictions = torch.argmax(outputs.logits, dim=-1)
predicted_labels = [model.config.id2label[pred.item()] for pred in predictions[0]]
print("Tokens:", tokenizer.convert_ids_to_tokens(tokens["input_ids"][0]))
print("Labels:", predicted_labels)
Usage with Pipeline
from transformers import pipeline
# Load NER pipeline
ner_pipeline = pipeline("ner", model="rgonzale/recipe-ner-model", aggregation_strategy="simple")
# Example text
text = "Mix 1 cup flour with 2 eggs and bake at 350 degrees for 25 minutes"
# Get entities
entities = ner_pipeline(text)
for entity in entities:
print(f"{entity['word']}: {entity['entity_group']} (confidence: {entity['score']:.2f})")
Command Line Usage
# Quick inference
uv run python scripts/infer_ner.py --text "Heat 2 tablespoons olive oil in a large pan"
# Interactive mode
uv run python scripts/infer_ner.py --interactive
Training Details
Training Data
- Dataset: TASTEset - A specialized dataset for recipe NER
- Task: Named Entity Recognition for recipes
- Preprocessing: Simple tokenization with whitespace/punctuation splitting
Training Parameters
- Learning rate: 3e-5
- Batch size: 8 (train), 16 (eval)
- Max sequence length: 512 tokens
- Optimizer: AdamW with weight decay
- Scheduler: Cosine annealing
Evaluation
Performance Metrics
The model was evaluated on the TASTEset validation set with the following results:
- Sequence-level F1: 0.90
- Token-level F1: 0.90
- Entity-level metrics:
- FOOD: F1 = 0.85
- QUANTITY: F1 = 0.97
- UNIT: F1 = 0.97
- PROCESS: F1 = 0.81
- PHYSICAL_QUALITY: F1 = 0.79
Example Predictions
Input: "Heat 2 tablespoons olive oil in a large pan over medium heat"
Predicted entities:
- QUANTITY: 2
- UNIT: tablespoons
- FOOD: olive oil
- PROCESS: Heat
Limitations
- The model may not recognize very uncommon ingredients or measurements
- Performance is optimized for recipe text similar to the TASTEset training data
- The model uses simple tokenization and may not handle complex punctuation well
Ethical Considerations
This model is intended for research and educational purposes. Users should be aware that:
- The model's predictions may not always be accurate
- Food-related predictions should not be used for dietary or health decisions
- The training data may contain biases present in recipe datasets
Citation
If you use this model, please cite:
@misc{recipe-ner-model,
title={Recipe NER Model},
author={Generated by Recipe Archaeology Project},
year={2024},
howpublished={\url{https://huggingface.co/rgonzale/recipe-ner-model}},
}
Contact
For questions or feedback about this model, please create an issue on the model's repository.
- Downloads last month
- 6