LiquidAI
/

LFM2-350M-PII-Extract-JP

@@ -66,17 +66,30 @@ Based on [LFM2-350M](https://huggingface.co/LiquidAI/LFM2-350M), LFM2-350M-PII-E
 The output can then be used to mask sensitive information.
 In particular, it is trained to extract:
-* Address/Locations
-* Company/Institute/Organization names
-* Email addresses
-* Human names
-* Phone numbers
 from Japanese documents and texts.
 ## Extraction Quality
 WIP
 ## Model Details
 **Generation parameters**: We strongly recommend using greedy decoding with a `temperature=0`.
@@ -89,6 +102,7 @@ Note the model can handle extraction of particular entities. E.g. The model will
 > [!WARNING]
 > ⚠️ For best performance, ensure alphabetical order of entity categories as shown above.
 **Chat template**: LFM2-PII-Extract-JP uses a ChatML-like chat template as follows:
 ```
@@ -105,6 +119,15 @@ You can automatically apply it using the dedicated [`.apply_chat_template()`](ht
 > [!WARNING]
 > ⚠️ The model is intended for single turn conversations.
 ## 🏃 How to run LFM2
 - Huggingface: [LFM2-350M](https://huggingface.co/LiquidAI/LFM2-350M)

 The output can then be used to mask sensitive information.
 In particular, it is trained to extract:
+* Address/Locations (JSON key: `address`)
+* Company/Institute/Organization names (JSON key: `company_name`)
+* Email addresses (JSON key: `email_address`)
+* Human names (JSON key: `human_name`)
+* Phone numbers (JSON key: `phone_number`)
 from Japanese documents and texts.
 ## Extraction Quality
 WIP
+> [!NOTE]
+> 📝 While LFM2-350M-PII-Extract-JP provides strong out-of-the-box PII entity extraction for the categories listed above,
+> our primary goal is to deliver a versatile, community-driven base model—a foundation that makes it easy to build
+> best-in-class, privacy-focused masking systems.
+>
+> Like any base model, there remain areas for continued development, particularly for specialized use cases:
+> - Supporting extraction of organization-specific identification numbers
+> - Expanding coverage to additional categories such as date of birth, passport numbers, and beyond
+>
+> These are precisely the kinds of challenges that fine-tuning—by both Liquid AI and our developer community—can
+> address. We see this model not just as an endpoint, but as a catalyst for a rich ecosystem of fine-tuned PII extraction
+> models tailored to real-world needs.
 ## Model Details
 **Generation parameters**: We strongly recommend using greedy decoding with a `temperature=0`.
 > [!WARNING]
 > ⚠️ For best performance, ensure alphabetical order of entity categories as shown above.
 **Chat template**: LFM2-PII-Extract-JP uses a ChatML-like chat template as follows:
 ```
 > [!WARNING]
 > ⚠️ The model is intended for single turn conversations.
+**Output format**
+The model outputs a JSON object containing the fields it was prompted to extract.
+If no entities are found in a particular category, it returns an empty list for that category.
+If entities are found, they are returned as a list for each prompted category.
+The model is trained to output entities exactly as they appear in the text.
+If the same entity appears multiple times with slight formatting variations, the model outputs all variations to ensure subsequent masking can be performed using exact matches.
 ## 🏃 How to run LFM2
 - Huggingface: [LFM2-350M](https://huggingface.co/LiquidAI/LFM2-350M)