ruke1ire commited on
Commit
8b311a4
·
verified ·
1 Parent(s): b1ce2a3

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +28 -5
README.md CHANGED
@@ -66,17 +66,30 @@ Based on [LFM2-350M](https://huggingface.co/LiquidAI/LFM2-350M), LFM2-350M-PII-E
66
  The output can then be used to mask sensitive information.
67
 
68
  In particular, it is trained to extract:
69
- * Address/Locations
70
- * Company/Institute/Organization names
71
- * Email addresses
72
- * Human names
73
- * Phone numbers
74
  from Japanese documents and texts.
75
 
76
  ## Extraction Quality
77
 
78
  WIP
79
 
 
 
 
 
 
 
 
 
 
 
 
 
 
80
  ## Model Details
81
 
82
  **Generation parameters**: We strongly recommend using greedy decoding with a `temperature=0`.
@@ -89,6 +102,7 @@ Note the model can handle extraction of particular entities. E.g. The model will
89
  > [!WARNING]
90
  > ⚠️ For best performance, ensure alphabetical order of entity categories as shown above.
91
 
 
92
  **Chat template**: LFM2-PII-Extract-JP uses a ChatML-like chat template as follows:
93
 
94
  ```
@@ -105,6 +119,15 @@ You can automatically apply it using the dedicated [`.apply_chat_template()`](ht
105
  > [!WARNING]
106
  > ⚠️ The model is intended for single turn conversations.
107
 
 
 
 
 
 
 
 
 
 
108
  ## 🏃 How to run LFM2
109
 
110
  - Huggingface: [LFM2-350M](https://huggingface.co/LiquidAI/LFM2-350M)
 
66
  The output can then be used to mask sensitive information.
67
 
68
  In particular, it is trained to extract:
69
+ * Address/Locations (JSON key: `address`)
70
+ * Company/Institute/Organization names (JSON key: `company_name`)
71
+ * Email addresses (JSON key: `email_address`)
72
+ * Human names (JSON key: `human_name`)
73
+ * Phone numbers (JSON key: `phone_number`)
74
  from Japanese documents and texts.
75
 
76
  ## Extraction Quality
77
 
78
  WIP
79
 
80
+ > [!NOTE]
81
+ > 📝 While LFM2-350M-PII-Extract-JP provides strong out-of-the-box PII entity extraction for the categories listed above,
82
+ > our primary goal is to deliver a versatile, community-driven base model—a foundation that makes it easy to build
83
+ > best-in-class, privacy-focused masking systems.
84
+ >
85
+ > Like any base model, there remain areas for continued development, particularly for specialized use cases:
86
+ > - Supporting extraction of organization-specific identification numbers
87
+ > - Expanding coverage to additional categories such as date of birth, passport numbers, and beyond
88
+ >
89
+ > These are precisely the kinds of challenges that fine-tuning—by both Liquid AI and our developer community—can
90
+ > address. We see this model not just as an endpoint, but as a catalyst for a rich ecosystem of fine-tuned PII extraction
91
+ > models tailored to real-world needs.
92
+
93
  ## Model Details
94
 
95
  **Generation parameters**: We strongly recommend using greedy decoding with a `temperature=0`.
 
102
  > [!WARNING]
103
  > ⚠️ For best performance, ensure alphabetical order of entity categories as shown above.
104
 
105
+
106
  **Chat template**: LFM2-PII-Extract-JP uses a ChatML-like chat template as follows:
107
 
108
  ```
 
119
  > [!WARNING]
120
  > ⚠️ The model is intended for single turn conversations.
121
 
122
+ **Output format**
123
+
124
+ The model outputs a JSON object containing the fields it was prompted to extract.
125
+ If no entities are found in a particular category, it returns an empty list for that category.
126
+ If entities are found, they are returned as a list for each prompted category.
127
+ The model is trained to output entities exactly as they appear in the text.
128
+ If the same entity appears multiple times with slight formatting variations, the model outputs all variations to ensure subsequent masking can be performed using exact matches.
129
+
130
+
131
  ## 🏃 How to run LFM2
132
 
133
  - Huggingface: [LFM2-350M](https://huggingface.co/LiquidAI/LFM2-350M)