|
# De-identification Benchmark Results |
|
**Model:** Minibase-DeId-Small |
|
**Dataset:** Personal_De-identifier_Benchmark_SFT.jsonl |
|
**Sample Size:** 100 |
|
**Date:** 2025-09-25T12:35:05.897062 |
|
|
|
## Overall Performance |
|
|
|
| Metric | Score | Description | |
|
|--------|-------|-------------| |
|
| PII Detection Rate | 0.203 | How well personal identifiers are detected | |
|
| Completeness Score | 0.640 | Percentage of texts fully de-identified | |
|
| Semantic Preservation | 0.109 | How well meaning is preserved | |
|
| Average Latency | 492.4ms | Response time performance | |
|
|
|
## Domain Performance |
|
|
|
### Medical Domain (33 samples) |
|
- PII Detection: 0.214 |
|
- Completeness: 0.606 |
|
- Semantic Preservation: 0.110 |
|
|
|
### Legal Domain (6 samples) |
|
- PII Detection: 0.113 |
|
- Completeness: 0.500 |
|
- Semantic Preservation: 0.056 |
|
|
|
### Hr Domain (11 samples) |
|
- PII Detection: 0.202 |
|
- Completeness: 0.273 |
|
- Semantic Preservation: 0.108 |
|
|
|
### General Domain (40 samples) |
|
- PII Detection: 0.218 |
|
- Completeness: 0.750 |
|
- Semantic Preservation: 0.120 |
|
|
|
### Research Domain (4 samples) |
|
- PII Detection: 0.192 |
|
- Completeness: 0.500 |
|
- Semantic Preservation: 0.108 |
|
|
|
### Customer_Service Domain (6 samples) |
|
- PII Detection: 0.140 |
|
- Completeness: 1.000 |
|
- Semantic Preservation: 0.083 |
|
|
|
## Example Results |
|
|
|
### Example 1 (medical domain) |
|
**Input:** Patient Sarah Johnson, DOB 05/12/1980, visited Dr. Lee at St. Jude Hospital on 2023-10-26. Her conta... |
|
**Expected:** Patient [NAME_1], DOB [DOB_1], visited [NAME_2] at [HOSPITAL_1] on [DATE_1]. Her contact is [PHONE_1... |
|
**Predicted:** Patient [FIRSTNAME_1] [MIDDLENAME_1], DOB [DOB_1], visited Dr. [LASTNAME_1] at [CITY_1] Hospital on ... |
|
**PII Detection:** 0.286 |
|
|
|
### Example 2 (legal domain) |
|
**Input:** Deponent Mr. Robert Davis, CEO of GlobalCorp Inc., stated under oath on December 1, 2022, that his a... |
|
**Expected:** Deponent [NAME_1], CEO of [ORGANIZATION_1], stated under oath on [DATE_1], that his attorney, [NAME_... |
|
**Predicted:** Deponent [PREFIX_1] [FIRSTNAME_1] [LASTNAME_1], CEO of [COMPANYNAME_1], stated under oath on [DATE_1... |
|
**PII Detection:** 0.167 |
|
|
|
### Example 3 (hr domain) |
|
**Input:** Employee ID: EMP-001-XYZ. Name: John Doe. Salary: $85,000. Email: [email protected]. Marital Stat... |
|
**Expected:** Employee ID: [EMPLOYEE_ID_1]. Name: [NAME_1]. Salary: [SALARY_1]. Email: [EMAIL_1]. Marital Status: ... |
|
**Predicted:** Employee ID: EMP-[BUILDINGNUMBER_1]. Name: [FIRSTNAME_1] Doe. Salary: [CURRENCYSYMBOL_1][AMOUNT_1]. ... |
|
**PII Detection:** 0.167 |
|
|
|
|