ruke1ire commited on
Commit
b1ce2a3
Β·
verified Β·
1 Parent(s): 1d171be

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +49 -1
README.md CHANGED
@@ -63,6 +63,54 @@ license_link: LICENSE
63
  # LFM2-350M-PII-Extract-JP
64
 
65
  Based on [LFM2-350M](https://huggingface.co/LiquidAI/LFM2-350M), LFM2-350M-PII-Extract-JP is designed to **extract personally identifiable information (PII) from Japanese text and output it in JSON format.**
66
- The resulting data can then be used to mask sensitive information.
67
 
 
 
 
 
 
 
 
68
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
63
  # LFM2-350M-PII-Extract-JP
64
 
65
  Based on [LFM2-350M](https://huggingface.co/LiquidAI/LFM2-350M), LFM2-350M-PII-Extract-JP is designed to **extract personally identifiable information (PII) from Japanese text and output it in JSON format.**
66
+ The output can then be used to mask sensitive information.
67
 
68
+ In particular, it is trained to extract:
69
+ * Address/Locations
70
+ * Company/Institute/Organization names
71
+ * Email addresses
72
+ * Human names
73
+ * Phone numbers
74
+ from Japanese documents and texts.
75
 
76
+ ## Extraction Quality
77
+
78
+ WIP
79
+
80
+ ## Model Details
81
+
82
+ **Generation parameters**: We strongly recommend using greedy decoding with a `temperature=0`.
83
+
84
+ **System prompts**: This checkpoint **requires** the following system prompt:
85
+ * `Extract <address>, <company_name>, <email_address>, <human_name>, <phone_number>`
86
+
87
+ Note the model can handle extraction of particular entities. E.g. The model will only output human names when the system prompt is set to `Extract <human_name>`.
88
+
89
+ > [!WARNING]
90
+ > ⚠️ For best performance, ensure alphabetical order of entity categories as shown above.
91
+
92
+ **Chat template**: LFM2-PII-Extract-JP uses a ChatML-like chat template as follows:
93
+
94
+ ```
95
+ <|startoftext|><|im_start|>system
96
+ Extract <address>, <company_name>, <email_address>, <human_name>, <phone_number><|im_end|>
97
+ <|im_start|>user
98
+ γ“γ‚“γ«γ‘γ―γ€γƒ©γƒŸγƒ³γ•γ‚“γ« B200 GPU γ‚’ 10000 台 至ζ€₯θ«‹ζ±‚γ—γ¦γγ γ•γ„γ€‚ι€£η΅‘ε…ˆγ― [email protected] (ι›»θ©±η•ͺ号010-000-0000) γ§γ€γ“γ‚Œγ― C. elegans η·šθ™«γ«η€ζƒ³γ‚’εΎ—γŸγƒ‹γƒ₯γƒΌγƒ©γƒ«γƒγƒƒγƒˆγƒ―γƒΌγ‚―γ‚’γƒΌγ‚­γƒ†γ‚―γƒγƒ£γ‚’ δ»Šγ™γζ§‹η―‰γ™γ‚‹γŸγ‚γ«δΈε―ζ¬ γ§γ™γ€‚<|im_end|>
99
+ <|im_start|>assistant
100
+ {"address": [], "company_name": [], "email_address": ["[email protected]"], "human_name": ["γƒ©γƒŸγƒ³"], "phone_number": ["010-000-0000"]}<|im_end|>
101
+ ```
102
+
103
+ You can automatically apply it using the dedicated [`.apply_chat_template()`](https://huggingface.co/docs/transformers/en/chat_templating#applychattemplate) function from Hugging Face transformers.
104
+
105
+ > [!WARNING]
106
+ > ⚠️ The model is intended for single turn conversations.
107
+
108
+ ## πŸƒ How to run LFM2
109
+
110
+ - Huggingface: [LFM2-350M](https://huggingface.co/LiquidAI/LFM2-350M)
111
+ - llama.cpp: [LFM2-350M-PII-Extract-JP-GGUF](https://huggingface.co/LiquidAI/LFM2-350M-PII-Extract-JP-GGUF)
112
+ - LEAP: [LEAP model library](https://leap.liquid.ai/models?model=lfm2-350m-pii-extract-jp)
113
+
114
+ ## πŸ“¬ Contact
115
+
116
+ If you are interested in custom solutions with edge deployment, please contact [our sales team](https://www.liquid.ai/contact).