Update README.md
Browse files
README.md
CHANGED
|
@@ -63,6 +63,54 @@ license_link: LICENSE
|
|
| 63 |
# LFM2-350M-PII-Extract-JP
|
| 64 |
|
| 65 |
Based on [LFM2-350M](https://huggingface.co/LiquidAI/LFM2-350M), LFM2-350M-PII-Extract-JP is designed to **extract personally identifiable information (PII) from Japanese text and output it in JSON format.**
|
| 66 |
-
The
|
| 67 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 68 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 63 |
# LFM2-350M-PII-Extract-JP
|
| 64 |
|
| 65 |
Based on [LFM2-350M](https://huggingface.co/LiquidAI/LFM2-350M), LFM2-350M-PII-Extract-JP is designed to **extract personally identifiable information (PII) from Japanese text and output it in JSON format.**
|
| 66 |
+
The output can then be used to mask sensitive information.
|
| 67 |
|
| 68 |
+
In particular, it is trained to extract:
|
| 69 |
+
* Address/Locations
|
| 70 |
+
* Company/Institute/Organization names
|
| 71 |
+
* Email addresses
|
| 72 |
+
* Human names
|
| 73 |
+
* Phone numbers
|
| 74 |
+
from Japanese documents and texts.
|
| 75 |
|
| 76 |
+
## Extraction Quality
|
| 77 |
+
|
| 78 |
+
WIP
|
| 79 |
+
|
| 80 |
+
## Model Details
|
| 81 |
+
|
| 82 |
+
**Generation parameters**: We strongly recommend using greedy decoding with a `temperature=0`.
|
| 83 |
+
|
| 84 |
+
**System prompts**: This checkpoint **requires** the following system prompt:
|
| 85 |
+
* `Extract <address>, <company_name>, <email_address>, <human_name>, <phone_number>`
|
| 86 |
+
|
| 87 |
+
Note the model can handle extraction of particular entities. E.g. The model will only output human names when the system prompt is set to `Extract <human_name>`.
|
| 88 |
+
|
| 89 |
+
> [!WARNING]
|
| 90 |
+
> β οΈ For best performance, ensure alphabetical order of entity categories as shown above.
|
| 91 |
+
|
| 92 |
+
**Chat template**: LFM2-PII-Extract-JP uses a ChatML-like chat template as follows:
|
| 93 |
+
|
| 94 |
+
```
|
| 95 |
+
<|startoftext|><|im_start|>system
|
| 96 |
+
Extract <address>, <company_name>, <email_address>, <human_name>, <phone_number><|im_end|>
|
| 97 |
+
<|im_start|>user
|
| 98 |
+
γγγ«γ‘γ―γγ©γγ³γγγ« B200 GPU γ 10000 ε° θ³ζ₯θ«ζ±γγ¦γγ γγγι£η΅‘ε
γ― [email protected] (ι»θ©±ηͺε·010-000-0000) γ§γγγγ― C. elegans η·θ«γ«ηζ³γεΎγγγ₯γΌγ©γ«γγγγ―γΌγ―γ’γΌγγγ―γγ£γ δ»γγζ§η―γγγγγ«δΈε―ζ¬ γ§γγ<|im_end|>
|
| 99 |
+
<|im_start|>assistant
|
| 100 |
+
{"address": [], "company_name": [], "email_address": ["[email protected]"], "human_name": ["γ©γγ³"], "phone_number": ["010-000-0000"]}<|im_end|>
|
| 101 |
+
```
|
| 102 |
+
|
| 103 |
+
You can automatically apply it using the dedicated [`.apply_chat_template()`](https://huggingface.co/docs/transformers/en/chat_templating#applychattemplate) function from Hugging Face transformers.
|
| 104 |
+
|
| 105 |
+
> [!WARNING]
|
| 106 |
+
> β οΈ The model is intended for single turn conversations.
|
| 107 |
+
|
| 108 |
+
## π How to run LFM2
|
| 109 |
+
|
| 110 |
+
- Huggingface: [LFM2-350M](https://huggingface.co/LiquidAI/LFM2-350M)
|
| 111 |
+
- llama.cpp: [LFM2-350M-PII-Extract-JP-GGUF](https://huggingface.co/LiquidAI/LFM2-350M-PII-Extract-JP-GGUF)
|
| 112 |
+
- LEAP: [LEAP model library](https://leap.liquid.ai/models?model=lfm2-350m-pii-extract-jp)
|
| 113 |
+
|
| 114 |
+
## π¬ Contact
|
| 115 |
+
|
| 116 |
+
If you are interested in custom solutions with edge deployment, please contact [our sales team](https://www.liquid.ai/contact).
|