File size: 2,040 Bytes
2374a43
 
 
 
396411e
2374a43
 
 
 
 
396411e
 
2374a43
396411e
2374a43
396411e
2374a43
396411e
 
 
2374a43
 
 
396411e
2374a43
 
 
396411e
2374a43
396411e
2374a43
 
 
396411e
2374a43
396411e
2374a43
 
 
396411e
2374a43
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
# De-identification Benchmark Results
**Model:** Minibase-DeId-Small
**Dataset:** Personal_De-identifier_Benchmark_SFT.jsonl
**Sample Size:** 100
**Date:** 2025-09-25T12:38:54.363196

## Overall Performance

| Metric | Score | Description |
|--------|-------|-------------|
| PII Detection Rate | 1.000 | How well personal identifiers are detected |
| Completeness Score | 0.670 | Percentage of texts fully de-identified |
| Semantic Preservation | 0.109 | How well meaning is preserved |
| Average Latency | 483.7ms | Response time performance |

## Key Improvements

- **PII Detection**: Now measures if model generates ANY placeholders when PII is present in input
- **Unified Evaluation**: All examples evaluated together (no domain separation)
- **Lenient Scoring**: Focuses on detection capability rather than exact placeholder matching

## Example Results

### Example 1
**Input:** Patient Sarah Johnson, DOB 05/12/1980, visited Dr. Lee at St. Jude Hospital on 2023-10-26. Her conta...
**Expected:** Patient [NAME_1], DOB [DOB_1], visited [NAME_2] at [HOSPITAL_1] on [DATE_1]. Her contact is [PHONE_1...
**Predicted:** Patient [FIRSTNAME_1] [MIDDLENAME_1], DOB [DOB_1], visited Dr. [LASTNAME_1] at [CITY_1] Hospital on ...
**PII Detection:** 1.000

### Example 2
**Input:** Deponent Mr. Robert Davis, CEO of GlobalCorp Inc., stated under oath on December 1, 2022, that his a...
**Expected:** Deponent [NAME_1], CEO of [ORGANIZATION_1], stated under oath on [DATE_1], that his attorney, [NAME_...
**Predicted:** Deponent [PREFIX_1] [FIRSTNAME_1] [LASTNAME_1], CEO of [COMPANYNAME_1], stated under oath on [DATE_1...
**PII Detection:** 1.000

### Example 3
**Input:** Employee ID: EMP-001-XYZ. Name: John Doe. Salary: $85,000. Email: [email protected]. Marital Stat...
**Expected:** Employee ID: [EMPLOYEE_ID_1]. Name: [NAME_1]. Salary: [SALARY_1]. Email: [EMAIL_1]. Marital Status: ...
**Predicted:** Employee ID: EMP-[BUILDINGNUMBER_1]. Name: [FIRSTNAME_1] Doe. Salary: [CURRENCYSYMBOL_1][AMOUNT_1]. ...
**PII Detection:** 1.000