Special Thanks: A special thanks to NationTech.io and to Cherry Republic for sponsoring the work.
Supported Tasks and Leaderboards
Main Function:
Address Parsing: Extracting structured address components (e.g., street, city, postal code) from unstructured text. Evaluation: Medium quality, the model struggles with complex extractions with mis spellings and duplicates.
Sub Functions:
Named Entity Recognition (NER): Identifying and classifying entities within text, including personal names, organizations, locations, and other categories.
Data Anonymization: Recognizing and extracting personally identifiable information (PII) in text data.
Domain Categorization: Extracting domain information and document types.
And more!
Languages:
The dataset primarily contains text in English but includes other languages due to the diversity of sources.
Dataset Structure
Data Instances
Each data instance consists of three main components:
System Message: Instructions provided to the assistant (model) for the task.
User Input: The textual content containing addresses or entities to be parsed.
Assistant Response: The assistant's output, providing the extracted address components or entities in JSON format.
Example:
- Downloads last month
- 29