Update README.md
Browse files
README.md
CHANGED
|
@@ -66,17 +66,30 @@ Based on [LFM2-350M](https://huggingface.co/LiquidAI/LFM2-350M), LFM2-350M-PII-E
|
|
| 66 |
The output can then be used to mask sensitive information.
|
| 67 |
|
| 68 |
In particular, it is trained to extract:
|
| 69 |
-
* Address/Locations
|
| 70 |
-
* Company/Institute/Organization names
|
| 71 |
-
* Email addresses
|
| 72 |
-
* Human names
|
| 73 |
-
* Phone numbers
|
| 74 |
from Japanese documents and texts.
|
| 75 |
|
| 76 |
## Extraction Quality
|
| 77 |
|
| 78 |
WIP
|
| 79 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 80 |
## Model Details
|
| 81 |
|
| 82 |
**Generation parameters**: We strongly recommend using greedy decoding with a `temperature=0`.
|
|
@@ -89,6 +102,7 @@ Note the model can handle extraction of particular entities. E.g. The model will
|
|
| 89 |
> [!WARNING]
|
| 90 |
> ⚠️ For best performance, ensure alphabetical order of entity categories as shown above.
|
| 91 |
|
|
|
|
| 92 |
**Chat template**: LFM2-PII-Extract-JP uses a ChatML-like chat template as follows:
|
| 93 |
|
| 94 |
```
|
|
@@ -105,6 +119,15 @@ You can automatically apply it using the dedicated [`.apply_chat_template()`](ht
|
|
| 105 |
> [!WARNING]
|
| 106 |
> ⚠️ The model is intended for single turn conversations.
|
| 107 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 108 |
## 🏃 How to run LFM2
|
| 109 |
|
| 110 |
- Huggingface: [LFM2-350M](https://huggingface.co/LiquidAI/LFM2-350M)
|
|
|
|
| 66 |
The output can then be used to mask sensitive information.
|
| 67 |
|
| 68 |
In particular, it is trained to extract:
|
| 69 |
+
* Address/Locations (JSON key: `address`)
|
| 70 |
+
* Company/Institute/Organization names (JSON key: `company_name`)
|
| 71 |
+
* Email addresses (JSON key: `email_address`)
|
| 72 |
+
* Human names (JSON key: `human_name`)
|
| 73 |
+
* Phone numbers (JSON key: `phone_number`)
|
| 74 |
from Japanese documents and texts.
|
| 75 |
|
| 76 |
## Extraction Quality
|
| 77 |
|
| 78 |
WIP
|
| 79 |
|
| 80 |
+
> [!NOTE]
|
| 81 |
+
> 📝 While LFM2-350M-PII-Extract-JP provides strong out-of-the-box PII entity extraction for the categories listed above,
|
| 82 |
+
> our primary goal is to deliver a versatile, community-driven base model—a foundation that makes it easy to build
|
| 83 |
+
> best-in-class, privacy-focused masking systems.
|
| 84 |
+
>
|
| 85 |
+
> Like any base model, there remain areas for continued development, particularly for specialized use cases:
|
| 86 |
+
> - Supporting extraction of organization-specific identification numbers
|
| 87 |
+
> - Expanding coverage to additional categories such as date of birth, passport numbers, and beyond
|
| 88 |
+
>
|
| 89 |
+
> These are precisely the kinds of challenges that fine-tuning—by both Liquid AI and our developer community—can
|
| 90 |
+
> address. We see this model not just as an endpoint, but as a catalyst for a rich ecosystem of fine-tuned PII extraction
|
| 91 |
+
> models tailored to real-world needs.
|
| 92 |
+
|
| 93 |
## Model Details
|
| 94 |
|
| 95 |
**Generation parameters**: We strongly recommend using greedy decoding with a `temperature=0`.
|
|
|
|
| 102 |
> [!WARNING]
|
| 103 |
> ⚠️ For best performance, ensure alphabetical order of entity categories as shown above.
|
| 104 |
|
| 105 |
+
|
| 106 |
**Chat template**: LFM2-PII-Extract-JP uses a ChatML-like chat template as follows:
|
| 107 |
|
| 108 |
```
|
|
|
|
| 119 |
> [!WARNING]
|
| 120 |
> ⚠️ The model is intended for single turn conversations.
|
| 121 |
|
| 122 |
+
**Output format**
|
| 123 |
+
|
| 124 |
+
The model outputs a JSON object containing the fields it was prompted to extract.
|
| 125 |
+
If no entities are found in a particular category, it returns an empty list for that category.
|
| 126 |
+
If entities are found, they are returned as a list for each prompted category.
|
| 127 |
+
The model is trained to output entities exactly as they appear in the text.
|
| 128 |
+
If the same entity appears multiple times with slight formatting variations, the model outputs all variations to ensure subsequent masking can be performed using exact matches.
|
| 129 |
+
|
| 130 |
+
|
| 131 |
## 🏃 How to run LFM2
|
| 132 |
|
| 133 |
- Huggingface: [LFM2-350M](https://huggingface.co/LiquidAI/LFM2-350M)
|