Xenova HF Staff commited on
Commit
775047a
Β·
verified Β·
1 Parent(s): c842468

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +209 -1
README.md CHANGED
@@ -2,4 +2,212 @@
2
  base_model:
3
  - LiquidAI/LFM2-1.2B
4
  library_name: transformers.js
5
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
2
  base_model:
3
  - LiquidAI/LFM2-1.2B
4
  library_name: transformers.js
5
+ license: other
6
+ license_name: lfm1.0
7
+ license_link: LICENSE
8
+ language:
9
+ - en
10
+ - ar
11
+ - zh
12
+ - fr
13
+ - de
14
+ - ja
15
+ - ko
16
+ - es
17
+ pipeline_tag: text-generation
18
+ tags:
19
+ - liquid
20
+ - edge
21
+ ---
22
+
23
+
24
+ <center>
25
+ <div style="text-align: center;">
26
+ <img
27
+ src="https://cdn-uploads.huggingface.co/production/uploads/61b8e2ba285851687028d395/7_6D7rWrLxp2hb6OHSV1p.png"
28
+ alt="Liquid AI"
29
+ style="width: 100%; max-width: 66%; height: auto; display: inline-block; margin-bottom: 0.5em; margin-top: 0.5em;"
30
+ />
31
+ </div>
32
+
33
+ <a href="https://playground.liquid.ai/chat">
34
+ <svg width="114.8" height="20" viewBox="0 0 1300 200" xmlns="http://www.w3.org/2000/svg" role="img" aria-label="Liquid Playground" style="margin-bottom: 1em;">
35
+ <title>Liquid: Playground</title>
36
+ <g>
37
+ <rect fill="#fff" width="600" height="200"></rect>
38
+ <rect fill="url(#x)" x="600" width="700" height="200"></rect>
39
+ </g>
40
+ <g transform="translate(20, 30) scale(0.4, 0.4)">
41
+ <path d="M172.314 129.313L172.219 129.367L206.125 188.18C210.671 195.154 213.324 203.457 213.324 212.382C213.324 220.834 210.956 228.739 206.839 235.479L275.924 213.178L167.853 33.6L141.827 76.9614L172.314 129.313Z" fill="black"/>
42
+ <path d="M114.217 302.4L168.492 257.003C168.447 257.003 168.397 257.003 168.352 257.003C143.515 257.003 123.385 237.027 123.385 212.387C123.385 203.487 126.023 195.204 130.55 188.24L162.621 132.503L135.966 86.7327L60.0762 213.183L114.127 302.4H114.217Z" fill="black"/>
43
+ <path d="M191.435 250.681C191.435 250.681 191.43 250.681 191.425 250.686L129.71 302.4H221.294L267.71 226.593L191.435 250.686V250.681Z" fill="black"/>
44
+ </g>
45
+ <g aria-hidden="true" fill="#fff" text-anchor="start" font-family="Verdana,DejaVu Sans,sans-serif" font-size="110">
46
+ <text x="200" y="148" textLength="329" fill="#000" opacity="0.1">Liquid</text>
47
+ <text x="190" y="138" textLength="329" fill="#000">Liquid</text>
48
+ <text x="655" y="148" textLength="619" fill="#000" opacity="0.1">Playground</text>
49
+ <text x="645" y="138" textLength="619">Playground</text>
50
+ </g>
51
+
52
+ <linearGradient id="x" x1="0%" y1="0%" x2="100%" y2="0%">
53
+ <stop offset="0%" style="stop-color:#000000"></stop>
54
+ <stop offset="100%" style="stop-color:#000000"></stop>
55
+ </linearGradient>
56
+ </svg>
57
+ </a>
58
+ </center>
59
+
60
+ # LFM2-1.2B
61
+
62
+ LFM2 is a new generation of hybrid models developed by [Liquid AI](https://www.liquid.ai/), specifically designed for edge AI and on-device deployment. It sets a new standard in terms of quality, speed, and memory efficiency.
63
+
64
+ We're releasing the weights of three post-trained checkpoints with 350M, 700M, and 1.2B parameters. They provide the following key features to create AI-powered edge applications:
65
+
66
+ * **Fast training & inference** – LFM2 achieves 3x faster training compared to its previous generation. It also benefits from 2x faster decode and prefill speed on CPU compared to Qwen3.
67
+ * **Best performance** – LFM2 outperforms similarly-sized models across multiple benchmark categories, including knowledge, mathematics, instruction following, and multilingual capabilities.
68
+ * **New architecture** – LFM2 is a new hybrid Liquid model with multiplicative gates and short convolutions.
69
+ * **Flexible deployment** – LFM2 runs efficiently on CPU, GPU, and NPU hardware for flexible deployment on smartphones, laptops, or vehicles.
70
+
71
+ Find more information about LFM2 in our [blog post](https://www.liquid.ai/blog/liquid-foundation-models-v2-our-second-series-of-generative-ai-models).
72
+
73
+ ## πŸ“„ Model details
74
+
75
+ Due to their small size, **we recommend fine-tuning LFM2 models on narrow use cases** to maximize performance.
76
+ They are particularly suited for agentic tasks, data extraction, RAG, creative writing, and multi-turn conversations.
77
+ However, we do not recommend using them for tasks that are knowledge-intensive or require programming skills.
78
+
79
+ | Property | Value |
80
+ | ------------------- | ----------------------------- |
81
+ | **Parameters** | 742,489,344 |
82
+ | **Layers** | 16 (10 conv + 6 attn) |
83
+ | **Context length** | 32,768 tokens |
84
+ | **Vocabulary size** | 65,536 |
85
+ | **Precision** | bfloat16 |
86
+ | **Training budget** | 10 trillion tokens |
87
+ | **License** | LFM Open License v1.0 |
88
+
89
+ **Supported languages**: English, Arabic, Chinese, French, German, Japanese, Korean, and Spanish.
90
+
91
+ **Generation parameters**: We recommend the following parameters:
92
+ * `temperature=0.3`
93
+ * `min_p=0.15`
94
+ * `repetition_penalty=1.05`
95
+
96
+ **Architecture**: Hybrid model with multiplicative gates and short convolutions: 10 double-gated short-range LIV convolution blocks and 6 grouped query attention (GQA) blocks.
97
+
98
+ **Pre-training mixture**: Approximately 75% English, 20% multilingual, and 5% code data sourced from the web and licensed materials.
99
+
100
+ **Training approach**:
101
+ * Knowledge distillation using [LFM1-7B](https://www.liquid.ai/blog/introducing-lfm-7b-setting-new-standards-for-efficient-language-models) as teacher model
102
+ * Very large-scale SFT on 50% downstream tasks, 50% general domains
103
+ * Custom DPO with length normalization and semi-online datasets
104
+ * Iterative model merging
105
+
106
+ ## πŸƒ How to run LFM2
107
+
108
+ ### Transformers.js
109
+
110
+ If you haven't already, you can install the [Transformers.js](https://huggingface.co/docs/transformers.js) JavaScript library from [NPM](https://www.npmjs.com/package/@huggingface/transformers) using:
111
+ ```bash
112
+ npm i @huggingface/transformers
113
+ ```
114
+
115
+ You can then generate text as follows:
116
+ ```js
117
+ import { pipeline, TextStreamer } from "@huggingface/transformers";
118
+
119
+ // Create a text generation pipeline
120
+ const generator = await pipeline(
121
+ "text-generation",
122
+ "onnx-community/LFM2-1.2B-ONNX",
123
+ { dtype: "q4" },
124
+ );
125
+
126
+ // Define the list of messages
127
+ const messages = [
128
+ { role: "system", content: "You are a helpful assistant." },
129
+ { role: "user", content: "What is the capital of France?" },
130
+ ];
131
+
132
+ // Generate a response
133
+ const output = await generator(messages, {
134
+ max_new_tokens: 512,
135
+ do_sample: false,
136
+ streamer: new TextStreamer(generator.tokenizer, { skip_prompt: true, skip_special_tokens: true}),
137
+ });
138
+ console.log(output[0].generated_text.at(-1).content);
139
+ // The capital of France is Paris.
140
+ ```
141
+
142
+ ### ONNXRuntime
143
+
144
+ ```py
145
+ from transformers import AutoConfig, AutoTokenizer
146
+ import onnxruntime
147
+ import numpy as np
148
+ from huggingface_hub import hf_hub_download
149
+
150
+ # 1. Load config, processor, and model
151
+ model_id = "onnx-community/LFM2-1.2B-ONNX"
152
+ config = AutoConfig.from_pretrained(model_id)
153
+ tokenizer = AutoTokenizer.from_pretrained(model_id)
154
+ filename = "model.onnx" # Options: "model.onnx", "model_fp16.onnx", "model_q4.onnx", "model_q4f16.onnx"
155
+ model_path = hf_hub_download(repo_id=model_id, filename=f"onnx/{filename}") # Download the graph
156
+ hf_hub_download(repo_id=model_id, filename=f"onnx/{filename}_data") # Download the weights
157
+ session = onnxruntime.InferenceSession(model_path)
158
+
159
+ ## Set config values
160
+ num_key_value_heads = config.num_key_value_heads
161
+ head_dim = config.hidden_size // config.num_attention_heads
162
+ num_hidden_layers = config.num_hidden_layers
163
+ eos_token_id = config.eos_token_id
164
+ hidden_size = config.hidden_size
165
+ conv_L_cache = config.conv_L_cache
166
+ layer_types = config.layer_types
167
+
168
+ # 2. Prepare inputs
169
+ prompt = "What is C. elegans?"
170
+ messages = [{"role": "user", "content": prompt}]
171
+ inputs = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="np")
172
+ input_ids = inputs['input_ids']
173
+ attention_mask = inputs['attention_mask']
174
+ batch_size = input_ids.shape[0]
175
+ position_ids = np.tile(np.arange(0, input_ids.shape[-1]), (batch_size, 1))
176
+ past_cache_values = {}
177
+ for i in range(num_hidden_layers):
178
+ if layer_types[i] == 'full_attention':
179
+ for kv in ('key', 'value'):
180
+ past_cache_values[f'past_key_values.{i}.{kv}'] = np.zeros([batch_size, num_key_value_heads, 0, head_dim], dtype=np.float32)
181
+ elif layer_types[i] == 'conv':
182
+ past_cache_values[f'past_conv.{i}'] = np.zeros([batch_size, hidden_size, conv_L_cache], dtype=np.float32)
183
+ else:
184
+ raise ValueError(f"Unsupported layer type: {layer_types[i]}")
185
+
186
+ # 3. Generation loop
187
+ max_new_tokens = 1024
188
+ generated_tokens = np.array([[]], dtype=np.int64)
189
+ for i in range(max_new_tokens):
190
+ logits, *present_cache_values = session.run(None, dict(
191
+ input_ids=input_ids,
192
+ attention_mask=attention_mask,
193
+ position_ids=position_ids,
194
+ **past_cache_values,
195
+ ))
196
+
197
+ ## Update values for next generation loop
198
+ input_ids = logits[:, -1].argmax(-1, keepdims=True)
199
+ attention_mask = np.concatenate([attention_mask, np.ones_like(input_ids, dtype=np.int64)], axis=-1)
200
+ position_ids = position_ids[:, -1:] + 1
201
+ for j, key in enumerate(past_cache_values):
202
+ past_cache_values[key] = present_cache_values[j]
203
+ generated_tokens = np.concatenate([generated_tokens, input_ids], axis=-1)
204
+ if (input_ids == eos_token_id).all():
205
+ break
206
+
207
+ ## (Optional) Streaming
208
+ print(tokenizer.decode(input_ids[0]), end='', flush=True)
209
+ print()
210
+
211
+ # 4. Output result
212
+ print(tokenizer.batch_decode(generated_tokens, skip_special_tokens=True)[0])
213
+ ```