Special Thanks: A special thanks to NationTech.io and to Cherry Republic for sponsoring the work.

image/png

Supported Tasks and Leaderboards

Main Function:

Address Parsing: Extracting structured address components (e.g., street, city, postal code) from unstructured text. Evaluation: Medium quality, the model struggles with complex extractions with mis spellings and duplicates.

Sub Functions:

Named Entity Recognition (NER): Identifying and classifying entities within text, including personal names, organizations, locations, and other categories.

Data Anonymization: Recognizing and extracting personally identifiable information (PII) in text data.

Domain Categorization: Extracting domain information and document types.

And more!

Languages:

The dataset primarily contains text in English but includes other languages due to the diversity of sources.

Dataset Structure

Data Instances

Each data instance consists of three main components:

System Message: Instructions provided to the assistant (model) for the task.

User Input: The textual content containing addresses or entities to be parsed.

Assistant Response: The assistant's output, providing the extracted address components or entities in JSON format.

Example:

image/png

image/png

image/png

Downloads last month
29
Safetensors
Model size
1.1B params
Tensor type
FP16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.