Upload benchmarks.txt with huggingface_hub
Browse files- benchmarks.txt +14 -40
benchmarks.txt
CHANGED
|
@@ -2,66 +2,40 @@
|
|
| 2 |
**Model:** Minibase-DeId-Small
|
| 3 |
**Dataset:** Personal_De-identifier_Benchmark_SFT.jsonl
|
| 4 |
**Sample Size:** 100
|
| 5 |
-
**Date:** 2025-09-25T12:
|
| 6 |
|
| 7 |
## Overall Performance
|
| 8 |
|
| 9 |
| Metric | Score | Description |
|
| 10 |
|--------|-------|-------------|
|
| 11 |
-
| PII Detection Rate |
|
| 12 |
-
| Completeness Score | 0.
|
| 13 |
| Semantic Preservation | 0.109 | How well meaning is preserved |
|
| 14 |
-
| Average Latency |
|
| 15 |
|
| 16 |
-
##
|
| 17 |
|
| 18 |
-
|
| 19 |
-
-
|
| 20 |
-
-
|
| 21 |
-
- Semantic Preservation: 0.110
|
| 22 |
-
|
| 23 |
-
### Legal Domain (6 samples)
|
| 24 |
-
- PII Detection: 0.113
|
| 25 |
-
- Completeness: 0.500
|
| 26 |
-
- Semantic Preservation: 0.056
|
| 27 |
-
|
| 28 |
-
### Hr Domain (11 samples)
|
| 29 |
-
- PII Detection: 0.202
|
| 30 |
-
- Completeness: 0.273
|
| 31 |
-
- Semantic Preservation: 0.108
|
| 32 |
-
|
| 33 |
-
### General Domain (40 samples)
|
| 34 |
-
- PII Detection: 0.218
|
| 35 |
-
- Completeness: 0.750
|
| 36 |
-
- Semantic Preservation: 0.120
|
| 37 |
-
|
| 38 |
-
### Research Domain (4 samples)
|
| 39 |
-
- PII Detection: 0.192
|
| 40 |
-
- Completeness: 0.500
|
| 41 |
-
- Semantic Preservation: 0.108
|
| 42 |
-
|
| 43 |
-
### Customer_Service Domain (6 samples)
|
| 44 |
-
- PII Detection: 0.140
|
| 45 |
-
- Completeness: 1.000
|
| 46 |
-
- Semantic Preservation: 0.083
|
| 47 |
|
| 48 |
## Example Results
|
| 49 |
|
| 50 |
-
### Example 1
|
| 51 |
**Input:** Patient Sarah Johnson, DOB 05/12/1980, visited Dr. Lee at St. Jude Hospital on 2023-10-26. Her conta...
|
| 52 |
**Expected:** Patient [NAME_1], DOB [DOB_1], visited [NAME_2] at [HOSPITAL_1] on [DATE_1]. Her contact is [PHONE_1...
|
| 53 |
**Predicted:** Patient [FIRSTNAME_1] [MIDDLENAME_1], DOB [DOB_1], visited Dr. [LASTNAME_1] at [CITY_1] Hospital on ...
|
| 54 |
-
**PII Detection:**
|
| 55 |
|
| 56 |
-
### Example 2
|
| 57 |
**Input:** Deponent Mr. Robert Davis, CEO of GlobalCorp Inc., stated under oath on December 1, 2022, that his a...
|
| 58 |
**Expected:** Deponent [NAME_1], CEO of [ORGANIZATION_1], stated under oath on [DATE_1], that his attorney, [NAME_...
|
| 59 |
**Predicted:** Deponent [PREFIX_1] [FIRSTNAME_1] [LASTNAME_1], CEO of [COMPANYNAME_1], stated under oath on [DATE_1...
|
| 60 |
-
**PII Detection:**
|
| 61 |
|
| 62 |
-
### Example 3
|
| 63 |
**Input:** Employee ID: EMP-001-XYZ. Name: John Doe. Salary: $85,000. Email: [email protected]. Marital Stat...
|
| 64 |
**Expected:** Employee ID: [EMPLOYEE_ID_1]. Name: [NAME_1]. Salary: [SALARY_1]. Email: [EMAIL_1]. Marital Status: ...
|
| 65 |
**Predicted:** Employee ID: EMP-[BUILDINGNUMBER_1]. Name: [FIRSTNAME_1] Doe. Salary: [CURRENCYSYMBOL_1][AMOUNT_1]. ...
|
| 66 |
-
**PII Detection:**
|
| 67 |
|
|
|
|
| 2 |
**Model:** Minibase-DeId-Small
|
| 3 |
**Dataset:** Personal_De-identifier_Benchmark_SFT.jsonl
|
| 4 |
**Sample Size:** 100
|
| 5 |
+
**Date:** 2025-09-25T12:38:54.363196
|
| 6 |
|
| 7 |
## Overall Performance
|
| 8 |
|
| 9 |
| Metric | Score | Description |
|
| 10 |
|--------|-------|-------------|
|
| 11 |
+
| PII Detection Rate | 1.000 | How well personal identifiers are detected |
|
| 12 |
+
| Completeness Score | 0.670 | Percentage of texts fully de-identified |
|
| 13 |
| Semantic Preservation | 0.109 | How well meaning is preserved |
|
| 14 |
+
| Average Latency | 483.7ms | Response time performance |
|
| 15 |
|
| 16 |
+
## Key Improvements
|
| 17 |
|
| 18 |
+
- **PII Detection**: Now measures if model generates ANY placeholders when PII is present in input
|
| 19 |
+
- **Unified Evaluation**: All examples evaluated together (no domain separation)
|
| 20 |
+
- **Lenient Scoring**: Focuses on detection capability rather than exact placeholder matching
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
## Example Results
|
| 23 |
|
| 24 |
+
### Example 1
|
| 25 |
**Input:** Patient Sarah Johnson, DOB 05/12/1980, visited Dr. Lee at St. Jude Hospital on 2023-10-26. Her conta...
|
| 26 |
**Expected:** Patient [NAME_1], DOB [DOB_1], visited [NAME_2] at [HOSPITAL_1] on [DATE_1]. Her contact is [PHONE_1...
|
| 27 |
**Predicted:** Patient [FIRSTNAME_1] [MIDDLENAME_1], DOB [DOB_1], visited Dr. [LASTNAME_1] at [CITY_1] Hospital on ...
|
| 28 |
+
**PII Detection:** 1.000
|
| 29 |
|
| 30 |
+
### Example 2
|
| 31 |
**Input:** Deponent Mr. Robert Davis, CEO of GlobalCorp Inc., stated under oath on December 1, 2022, that his a...
|
| 32 |
**Expected:** Deponent [NAME_1], CEO of [ORGANIZATION_1], stated under oath on [DATE_1], that his attorney, [NAME_...
|
| 33 |
**Predicted:** Deponent [PREFIX_1] [FIRSTNAME_1] [LASTNAME_1], CEO of [COMPANYNAME_1], stated under oath on [DATE_1...
|
| 34 |
+
**PII Detection:** 1.000
|
| 35 |
|
| 36 |
+
### Example 3
|
| 37 |
**Input:** Employee ID: EMP-001-XYZ. Name: John Doe. Salary: $85,000. Email: [email protected]. Marital Stat...
|
| 38 |
**Expected:** Employee ID: [EMPLOYEE_ID_1]. Name: [NAME_1]. Salary: [SALARY_1]. Email: [EMAIL_1]. Marital Status: ...
|
| 39 |
**Predicted:** Employee ID: EMP-[BUILDINGNUMBER_1]. Name: [FIRSTNAME_1] Doe. Salary: [CURRENCYSYMBOL_1][AMOUNT_1]. ...
|
| 40 |
+
**PII Detection:** 1.000
|
| 41 |
|