YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Overview
pii-phi
is a fine-tuned version of Phi-3.5-mini-instruct
designed to extract Personally Identifiable Information (PII) from unstructured text. The model outputs PII entities in a structured JSON format according to strict schema guidelines.
Training Prompt Format
# GUIDELINES
- Extract all instances of the following Personally Identifiable Information (PII) entities from the provided text and return them in JSON format.
- Each item in the JSON list should include an 'entity' key specifying the type of PII and a 'value' key containing the extracted information.
- The supported entities are: PERSON_NAME, BUSINESS_NAME, API_KEY, USERNAME, API_ENDPOINT, WEBSITE_ADDRESS, PHONE_NUMBER, EMAIL_ADDRESS, ID, PASSWORD, ADDRESS.
# EXPECTED OUTPUT
- The json output must be in the format below:
{
"result": [
{"entity": "ENTITY_TYPE", "value": "EXTRACTED_VALUE"},
...
]
}
Supported Entities
- PERSON_NAME
- BUSINESS_NAME
- API_KEY
- USERNAME
- API_ENDPOINT
- WEBSITE_ADDRESS
- PHONE_NUMBER
- EMAIL_ADDRESS
- ID
- PASSWORD
- ADDRESS
Intended Use
The model is intended for PII detection in text documents to support tasks such as data anonymization, compliance, and security auditing.
Limitations
- Not guaranteed to detect all forms of PII in every context.
- May return false positives or omit contextually relevant information.
Installation
Install the vllm
package to run the model efficiently:
pip install vllm
Example:
from vllm import LLM, SamplingParams
llm = LLM("Fsoft-AIC/pii-phi")
system_prompt = """
# GUIDELINES
- Extract all instances of the following Personally Identifiable Information (PII) entities from the provided text and return them in JSON format.
- Each item in the JSON list should include an 'entity' key specifying the type of PII and a 'value' key containing the extracted information.
- The supported entities are: PERSON_NAME, BUSINESS_NAME, API_KEY, USERNAME, API_ENDPOINT, WEBSITE_ADDRESS, PHONE_NUMBER, EMAIL_ADDRESS, ID, PASSWORD, ADDRESS.
# EXPECTED OUTPUT
- The json output must be in the format below:
{
"result": [
{"entity": "ENTITY_TYPE", "value": "EXTRACTED_VALUE"},
...
]
}
"""
pii_message = "I am James Jake and my employee number is 123123123"
sampling_params = SamplingParams(temperature=0, max_tokens=1000)
outputs = llm.chat(
[
{"role": "system", "content": system_prompt},
{"role": "user", "content": pii_message},
],
sampling_params,
)
for output in outputs:
generated_text = output.outputs[0].text
print(generated_text)
- Downloads last month
- 279,800
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support