hyx21 commited on
Commit
199ee99
·
verified ·
1 Parent(s): c93ebc6

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +315 -3
README.md CHANGED
@@ -1,3 +1,315 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - zh
5
+ - en
6
+ pipeline_tag: text-generation
7
+ library_name: transformers
8
+ ---
9
+ <div align="center">
10
+ <img src="https://github.com/OpenBMB/MiniCPM/blob/main/assets/minicpm_logo.png?raw=true" width="500em" ></img>
11
+ </div>
12
+
13
+ <p align="center">
14
+ <a href="https://github.com/OpenBMB/MiniCPM/" target="_blank">GitHub Repo</a> |
15
+ <a href="https://github.com/OpenBMB/MiniCPM/tree/main/report/MiniCPM_4_Technical_Report.pdf" target="_blank">Technical Report</a>
16
+ </p>
17
+ <p align="center">
18
+ 👋 Join us on <a href="https://discord.gg/3cGQn9b3YM" target="_blank">Discord</a> and <a href="https://github.com/OpenBMB/MiniCPM/blob/main/assets/wechat.jpg" target="_blank">WeChat</a>
19
+ </p>
20
+
21
+ ## What's New
22
+ - [2025.06.06] **MiniCPM4** series are released! This model achieves ultimate efficiency improvements while maintaining optimal performance at the same scale! It can achieve over 5x generation acceleration on typical end-side chips! You can find technical report [here](https://github.com/OpenBMB/MiniCPM/tree/main/report/MiniCPM_4_Technical_Report.pdf).🔥🔥🔥
23
+
24
+ ## MiniCPM4 Series
25
+ MiniCPM4 series are highly efficient large language models (LLMs) designed explicitly for end-side devices, which achieves this efficiency through systematic innovation in four key dimensions: model architecture, training data, training algorithms, and inference systems.
26
+ - [MiniCPM4-8B](https://huggingface.co/openbmb/MiniCPM4-8B): The flagship of MiniCPM4, with 8B parameters, trained on 8T tokens. (**<-- you are here**)
27
+ - [MiniCPM4-0.5B](https://huggingface.co/openbmb/MiniCPM4-0.5B): The small version of MiniCPM4, with 0.5B parameters, trained on 1T tokens.
28
+ - [MiniCPM4-8B-Eagle-FRSpec](https://huggingface.co/openbmb/MiniCPM4-8B-Eagle-FRSpec): Eagle head for FRSpec, accelerating speculative inference for MiniCPM4-8B.
29
+ - [MiniCPM4-8B-Eagle-FRSpec-QAT](https://huggingface.co/openbmb/MiniCPM4-8B-Eagle-FRSpec-QAT): Eagle head trained with QAT for FRSpec, efficiently integrate speculation and quantization to achieve ultra acceleration for MiniCPM4-8B.
30
+ - [BitCPM4-0.5B](https://huggingface.co/openbmb/BitCPM4-0.5B): Extreme ternary quantization applied to MiniCPM4-0.5B compresses model parameters into ternary values, achieving a 90% reduction in bit width.
31
+ - [BitCPM4-1B](https://huggingface.co/openbmb/BitCPM4-1B): Extreme ternary quantization applied to MiniCPM3-1B compresses model parameters into ternary values, achieving a 90% reduction in bit width.
32
+ - [MiniCPM4-Survey](https://huggingface.co/openbmb/MiniCPM4-Survey): Based on MiniCPM4-8B, accepts users' quiries as input and autonomously generate trustworthy, long-form survey papers.
33
+ - [MiniCPM4-MCP](https://huggingface.co/openbmb/MiniCPM4-MCP): Based on MiniCPM4-8B, accepts users' queries and available MCP tools as input and autonomously calls relevant MCP tools to satisfy users' requirements.
34
+
35
+ ## Introduction
36
+ MiniCPM 4 is an extremely efficient edge-side large model that has undergone efficient optimization across four dimensions: model architecture, learning algorithms, training data, and inference systems, achieving ultimate efficiency improvements.
37
+
38
+ - 🏗️ **Efficient Model Architecture:**
39
+ - InfLLM v2 -- Trainable Sparse Attention Mechanism: Adopts a trainable sparse attention mechanism architecture where each token only needs to compute relevance with less than 5% of tokens in 128K long text processing, significantly reducing computational overhead for long texts
40
+
41
+ - 🧠 **Efficient Learning Algorithms:**
42
+ - Model Wind Tunnel 2.0 -- Efficient Predictable Scaling: Introduces scaling prediction methods for performance of downstream tasks, enabling more precise model training configuration search
43
+ - BitCPM -- Ultimate Ternary Quantization: Compresses model parameter bit-width to 3 values, achieving 90% extreme model bit-width reduction
44
+ - Efficient Training Engineering Optimization: Adopts FP8 low-precision computing technology combined with Multi-token Prediction training strategy
45
+
46
+ - 📚 **High-Quality Training Data:**
47
+ - UltraClean -- High-quality Pre-training Data Filtering and Generation: Builds iterative data cleaning strategies based on efficient data verification, open-sourcing high-quality Chinese and English pre-training dataset [UltraFinweb](https://huggingface.co/datasets/openbmb/Ultra-FineWeb)
48
+ - UltraChat v2 -- High-quality Supervised Fine-tuning Data Generation: Constructs large-scale high-quality supervised fine-tuning datasets covering multiple dimensions including knowledge-intensive data, reasoning-intensive data, instruction-following data, long text understanding data, and tool calling data
49
+
50
+ - ⚡ **Efficient Inference System:**
51
+ - FRSpec -- Lightweight Speculative Sampling: Achieves draft model acceleration through vocabulary pruning of draft model
52
+ - ArkInfer -- Cross-platform Deployment System: Supports efficient deployment across multiple backend environments, providing flexible cross-platform adaptation capabilities
53
+
54
+ ## Usage
55
+
56
+ ### Using Quantized Eagle Speculative Decoding with [vLLM](https://github.com/vllm-project/vllm)
57
+ For now, you need to install the latest version of vLLM.
58
+ ```
59
+ pip install -U vllm \
60
+ --pre \
61
+ --extra-index-url https://wheels.vllm.ai/nightly
62
+ ```
63
+
64
+ Then you can use Quantized Eagle Speculative Decoding to inference MiniCPM4-8B with vLLM. Use `speculative_config` to set the draft model.
65
+ ```python
66
+ from transformers import AutoTokenizer
67
+ from vllm import LLM, SamplingParams
68
+
69
+ model_name = "openbmb/MiniCPM4-8B-marlin-vLLM"
70
+ prompt = [{"role": "user", "content": "Please recommend 5 tourist attractions in Beijing. "}]
71
+
72
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
73
+ input_text = tokenizer.apply_chat_template(prompt, tokenize=False, add_generation_prompt=True)
74
+
75
+ llm = LLM(
76
+ model=model_name,
77
+ trust_remote_code=True,
78
+ max_num_batched_tokens=32768,
79
+ dtype="bfloat16",
80
+ gpu_memory_utilization=0.8,
81
+ speculative_config={
82
+ "method": "eagle",
83
+ "model": "openbmb/MiniCPM4-8B-marlin-Eagle-vLLM",
84
+ "num_speculative_tokens": 2,
85
+ "max_model_len": 32768,
86
+ },
87
+ )
88
+ sampling_params = SamplingParams(top_p=0.7, temperature=0.7, max_tokens=1024, repetition_penalty=1.02)
89
+
90
+ outputs = llm.generate(prompts=input_text, sampling_params=sampling_params)
91
+
92
+ print(outputs[0].outputs[0].text)
93
+ ```
94
+
95
+ ### Inference Quantized MiniCPM4-8B with [vLLM](https://github.com/vllm-project/vllm)
96
+ For now, you need to install the latest version of vLLM.
97
+ ```
98
+ pip install -U vllm \
99
+ --pre \
100
+ --extra-index-url https://wheels.vllm.ai/nightly
101
+ ```
102
+
103
+ Then you can inference Quantized MiniCPM4-8B with vLLM.
104
+ ```python
105
+ from transformers import AutoTokenizer
106
+ from vllm import LLM, SamplingParams
107
+
108
+ model_name = "openbmb/MiniCPM4-8B-marlin-vLLM"
109
+ prompt = [{"role": "user", "content": "Please recommend 5 tourist attractions in Beijing. "}]
110
+
111
+ tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True)
112
+ input_text = tokenizer.apply_chat_template(prompt, tokenize=False, add_generation_prompt=True)
113
+
114
+ llm = LLM(
115
+ model=model_name,
116
+ trust_remote_code=True,
117
+ max_num_batched_tokens=32768,
118
+ dtype="bfloat16",
119
+ gpu_memory_utilization=0.8,
120
+ )
121
+ sampling_params = SamplingParams(top_p=0.7, temperature=0.7, max_tokens=1024, repetition_penalty=1.02)
122
+
123
+ outputs = llm.generate(prompts=input_text, sampling_params=sampling_params)
124
+
125
+ print(outputs[0].outputs[0].text)
126
+ ```
127
+
128
+ ### Inference with [CPM.cu](https://github.com/OpenBMB/cpm.cu)
129
+
130
+ We recommend using [CPM.cu](https://github.com/OpenBMB/cpm.cu) for the inference of MiniCPM4. CPM.cu is a CUDA inference framework developed by OpenBMB, which integrates efficient sparse, speculative sampling, and quantization techniques, fully leveraging the efficiency advantages of MiniCPM4.
131
+
132
+ You can install CPM.cu by running the following command:
133
+
134
+ ```bash
135
+ git clone https://github.com/OpenBMB/cpm.cu.git --recursive
136
+ cd cpm.cu
137
+ python3 setup.py install
138
+ ```
139
+
140
+ MiniCPM4 natively supports context lengths of up to 32,768 tokens. To reproduce the long-text acceleration effect in the paper, we recommend using the LongRoPE factors that have been validated. Change the `rope_scaling` field in the `config.json` file as the following to enable LongRoPE.
141
+ ```json
142
+ {
143
+ ...,
144
+ "rope_scaling": {
145
+ "rope_type": "longrope",
146
+ "long_factor": [0.9977997200264581, 1.014658295992452, 1.0349680404997148, 1.059429246056193, 1.0888815016813513, 1.1243301355211495, 1.166977103606075, 1.2182568066927284, 1.2798772354275727, 1.3538666751582975, 1.4426259039919596, 1.5489853358570191, 1.6762658237220625, 1.8283407612492941, 2.0096956085876183, 2.225478927469756, 2.481536379650452, 2.784415934557119, 3.1413289096347365, 3.560047844772632, 4.048719380066383, 4.752651957515948, 5.590913044973868, 6.584005926629993, 7.7532214876576155, 9.119754865903639, 10.704443927019176, 12.524994176518703, 14.59739595363613, 16.93214476166354, 19.53823297353041, 22.417131025031697, 25.568260840911098, 28.991144156566317, 32.68408069090375, 36.65174474170465, 40.90396065611201, 45.4664008671033, 50.37147343433591, 55.6804490772103, 61.470816952306556, 67.8622707390618, 75.00516023410414, 83.11898235973767, 92.50044360202462, 103.57086856690864, 116.9492274587385, 118.16074567836519, 119.18497548708795, 120.04810876261652, 120.77352815196981, 121.38182790207875, 121.89094985353891, 122.31638758099915, 122.6714244963338, 122.9673822552567, 123.21386397019609, 123.41898278254268, 123.58957065488238, 123.73136519024158, 123.84917421274221, 123.94701903496814, 124.02825801299717, 124.09569231686116],
147
+ "short_factor": [0.9977997200264581, 1.014658295992452, 1.0349680404997148, 1.059429246056193, 1.0888815016813513, 1.1243301355211495, 1.166977103606075, 1.2182568066927284, 1.2798772354275727, 1.3538666751582975, 1.4426259039919596, 1.5489853358570191, 1.6762658237220625, 1.8283407612492941, 2.0096956085876183, 2.225478927469756, 2.481536379650452, 2.784415934557119, 3.1413289096347365, 3.560047844772632, 4.048719380066383, 4.752651957515948, 5.590913044973868, 6.584005926629993, 7.7532214876576155, 9.119754865903639, 10.704443927019176, 12.524994176518703, 14.59739595363613, 16.93214476166354, 19.53823297353041, 22.417131025031697, 25.568260840911098, 28.991144156566317, 32.68408069090375, 36.65174474170465, 40.90396065611201, 45.4664008671033, 50.37147343433591, 55.6804490772103, 61.470816952306556, 67.8622707390618, 75.00516023410414, 83.11898235973767, 92.50044360202462, 103.57086856690864, 116.9492274587385, 118.16074567836519, 119.18497548708795, 120.04810876261652, 120.77352815196981, 121.38182790207875, 121.89094985353891, 122.31638758099915, 122.6714244963338, 122.9673822552567, 123.21386397019609, 123.41898278254268, 123.58957065488238, 123.73136519024158, 123.84917421274221, 123.94701903496814, 124.02825801299717, 124.09569231686116],
148
+ "original_max_position_embeddings": 32768
149
+ }
150
+ }
151
+ ```
152
+
153
+ After modification, you can run the following command to reproduce the long-context acceleration effect (the script will automatically download the model weights from HuggingFace)
154
+ ```bash
155
+ python3 tests/test_generate.py
156
+ ```
157
+
158
+ For more details about CPM.cu, please refer to [the repo CPM.cu](https://github.com/OpenBMB/cpm.cu).
159
+
160
+ ### Inference with Transformers
161
+ ```python
162
+ from transformers import AutoModelForCausalLM, AutoTokenizer
163
+ import torch
164
+ torch.manual_seed(0)
165
+
166
+ path = 'openbmb/MiniCPM4-8B'
167
+ device = "cuda"
168
+ tokenizer = AutoTokenizer.from_pretrained(path)
169
+ model = AutoModelForCausalLM.from_pretrained(path, torch_dtype=torch.bfloat16, device_map=device, trust_remote_code=True)
170
+
171
+ # User can directly use the chat interface
172
+ # responds, history = model.chat(tokenizer, "Write an article about Artificial Intelligence.", temperature=0.7, top_p=0.7)
173
+ # print(responds)
174
+
175
+ # User can also use the generate interface
176
+ messages = [
177
+ {"role": "user", "content": "Write an article about Artificial Intelligence."},
178
+ ]
179
+ model_inputs = tokenizer.apply_chat_template(messages, return_tensors="pt", add_generation_prompt=True).to(device)
180
+
181
+ model_outputs = model.generate(
182
+ model_inputs,
183
+ max_new_tokens=1024,
184
+ top_p=0.7,
185
+ temperature=0.7
186
+ )
187
+ output_token_ids = [
188
+ model_outputs[i][len(model_inputs[i]):] for i in range(len(model_inputs))
189
+ ]
190
+
191
+ responses = tokenizer.batch_decode(output_token_ids, skip_special_tokens=True)[0]
192
+ print(responses)
193
+ ```
194
+
195
+ MiniCPM4-8B supports `InfLLM v2`, a sparse attention mechanism designed for efficient long-sequence inference. It requires the [infllmv2_cuda_impl](https://github.com/OpenBMB/infllmv2_cuda_impl) library.
196
+
197
+ You can install it by running the following command:
198
+ ```bash
199
+ git clone -b feature_infer https://github.com/OpenBMB/infllmv2_cuda_impl.git
200
+ cd infllmv2_cuda_impl
201
+ git submodule update --init --recursive
202
+ pip install -e . # or python setup.py install
203
+ ```
204
+
205
+ To enable InfLLM v2, you need to add the `sparse_config` field in `config.json`:
206
+ ```json
207
+ {
208
+ ...,
209
+ "sparse_config": {
210
+ "kernel_size": 32,
211
+ "kernel_stride": 16,
212
+ "init_blocks": 1,
213
+ "block_size": 64,
214
+ "window_size": 2048,
215
+ "topk": 64,
216
+ "use_nope": false,
217
+ "dense_len": 8192
218
+ }
219
+ }
220
+ ```
221
+
222
+ These parameters control the behavior of InfLLM v2:
223
+ * `kernel_size` (default: 32): The size of semantic kernels.
224
+ * `kernel_stride` (default: 16): The stride between adjacent kernels.
225
+ * `init_blocks` (default: 1): The number of initial blocks that every query token attends to. This ensures attention to the beginning of the sequence.
226
+ * `block_size` (default: 64): The block size for key-value blocks.
227
+ * `window_size` (default: 2048): The size of the local sliding window.
228
+ * `topk` (default: 64): The specifies that each token computes attention with only the top-k most relevant key-value blocks.
229
+ * `use_nope` (default: false): Whether to use the NOPE technique in block selection for improved performance.
230
+ * `dense_len` (default: 8192): Since Sparse Attention offers limited benefits for short sequences, the model can use standard (dense) attention for shorter texts. The model will use dense attention for sequences with a token length below `dense_len` and switch to sparse attention for sequences exceeding this length. Set this to `-1` to always use sparse attention regardless of sequence length.
231
+
232
+ MiniCPM4 natively supports context lengths of up to 32,768 tokens. For conversations where the total length (including both input and output) significantly exceeds this limit, we recommend using RoPE scaling techniques for effective handling of long texts. We have validated the model's performance on context lengths of up to 131,072 tokens by modifying the LongRoPE factor.
233
+
234
+ You can apply the LongRoPE factor modification by modifying the model files. Specifically, in the `config.json` file, adjust the `rope_scaling` fields.
235
+ ```json
236
+ {
237
+ ...,
238
+ "rope_scaling": {
239
+ "rope_type": "longrope",
240
+ "long_factor": [0.9977997200264581, 1.014658295992452, 1.0349680404997148, 1.059429246056193, 1.0888815016813513, 1.1243301355211495, 1.166977103606075, 1.2182568066927284, 1.2798772354275727, 1.3538666751582975, 1.4426259039919596, 1.5489853358570191, 1.6762658237220625, 1.8283407612492941, 2.0096956085876183, 2.225478927469756, 2.481536379650452, 2.784415934557119, 3.1413289096347365, 3.560047844772632, 4.048719380066383, 4.752651957515948, 5.590913044973868, 6.584005926629993, 7.7532214876576155, 9.119754865903639, 10.704443927019176, 12.524994176518703, 14.59739595363613, 16.93214476166354, 19.53823297353041, 22.417131025031697, 25.568260840911098, 28.991144156566317, 32.68408069090375, 36.65174474170465, 40.90396065611201, 45.4664008671033, 50.37147343433591, 55.6804490772103, 61.470816952306556, 67.8622707390618, 75.00516023410414, 83.11898235973767, 92.50044360202462, 103.57086856690864, 116.9492274587385, 118.16074567836519, 119.18497548708795, 120.04810876261652, 120.77352815196981, 121.38182790207875, 121.89094985353891, 122.31638758099915, 122.6714244963338, 122.9673822552567, 123.21386397019609, 123.41898278254268, 123.58957065488238, 123.73136519024158, 123.84917421274221, 123.94701903496814, 124.02825801299717, 124.09569231686116],
241
+ "short_factor": [0.9977997200264581, 1.014658295992452, 1.0349680404997148, 1.059429246056193, 1.0888815016813513, 1.1243301355211495, 1.166977103606075, 1.2182568066927284, 1.2798772354275727, 1.3538666751582975, 1.4426259039919596, 1.5489853358570191, 1.6762658237220625, 1.8283407612492941, 2.0096956085876183, 2.225478927469756, 2.481536379650452, 2.784415934557119, 3.1413289096347365, 3.560047844772632, 4.048719380066383, 4.752651957515948, 5.590913044973868, 6.584005926629993, 7.7532214876576155, 9.119754865903639, 10.704443927019176, 12.524994176518703, 14.59739595363613, 16.93214476166354, 19.53823297353041, 22.417131025031697, 25.568260840911098, 28.991144156566317, 32.68408069090375, 36.65174474170465, 40.90396065611201, 45.4664008671033, 50.37147343433591, 55.6804490772103, 61.470816952306556, 67.8622707390618, 75.00516023410414, 83.11898235973767, 92.50044360202462, 103.57086856690864, 116.9492274587385, 118.16074567836519, 119.18497548708795, 120.04810876261652, 120.77352815196981, 121.38182790207875, 121.89094985353891, 122.31638758099915, 122.6714244963338, 122.9673822552567, 123.21386397019609, 123.41898278254268, 123.58957065488238, 123.73136519024158, 123.84917421274221, 123.94701903496814, 124.02825801299717, 124.09569231686116],
242
+ "original_max_position_embeddings": 32768
243
+ }
244
+ }
245
+ ```
246
+
247
+ ### Inference with [SGLang](https://github.com/sgl-project/sglang)
248
+
249
+ For now, you need to install our forked version of SGLang.
250
+ ```bash
251
+ git clone -b openbmb https://github.com/OpenBMB/sglang.git
252
+ cd sglang
253
+
254
+ pip install --upgrade pip
255
+ pip install -e "python[all]"
256
+ ```
257
+
258
+ You can start the inference server by running the following command:
259
+ ```bash
260
+ python -m sglang.launch_server --model openbmb/MiniCPM4-8B --trust-remote-code --port 30000 --chat-template chatml
261
+ ```
262
+
263
+ Then you can use the chat interface by running the following command:
264
+ ```python
265
+ import openai
266
+
267
+ client = openai.Client(base_url=f"http://localhost:30000/v1", api_key="None")
268
+
269
+ response = client.chat.completions.create(
270
+ model="openbmb/MiniCPM4-8B",
271
+ messages=[
272
+ {"role": "user", "content": "Write an article about Artificial Intelligence."},
273
+ ],
274
+ temperature=0.7,
275
+ max_tokens=1024,
276
+ )
277
+
278
+ print(response.choices[0].message.content)
279
+ ```
280
+
281
+
282
+ ## Evaluation Results
283
+ On two typical end-side chips, Jetson AGX Orin and RTX 4090, MiniCPM4 demonstrates significantly faster processing speed compared to similar-size models in long text processing tasks. As text length increases, MiniCPM4's efficiency advantage becomes more pronounced. On the Jetson AGX Orin platform, compared to Qwen3-8B, MiniCPM4 achieves approximately 7x decoding speed improvement.
284
+
285
+ ![benchmark](https://github.com/OpenBMB/MiniCPM/blob/main/assets/minicpm4/efficiency.png?raw=true)
286
+
287
+ #### Comprehensive Evaluation
288
+ MiniCPM4 launches end-side versions with 8B and 0.5B parameter scales, both achieving best-in-class performance in their respective categories.
289
+
290
+ ![benchmark](https://github.com/OpenBMB/MiniCPM/blob/main/assets/minicpm4/benchmark.png?raw=true)
291
+
292
+ #### Long Text Evaluation
293
+ MiniCPM4 is pre-trained on 32K long texts and achieves length extension through YaRN technology. In the 128K long text needle-in-a-haystack task, MiniCPM4 demonstrates outstanding performance.
294
+
295
+ ![long-niah](https://github.com/OpenBMB/MiniCPM/blob/main/assets/minicpm4/128k-niah.png?raw=true)
296
+
297
+ ## Statement
298
+ - As a language model, MiniCPM generates content by learning from a vast amount of text.
299
+ - However, it does not possess the ability to comprehend or express personal opinions or value judgments.
300
+ - Any content generated by MiniCPM does not represent the viewpoints or positions of the model developers.
301
+ - Therefore, when using content generated by MiniCPM, users should take full responsibility for evaluating and verifying it on their own.
302
+
303
+ ## LICENSE
304
+ - This repository and MiniCPM models are released under the [Apache-2.0](https://github.com/OpenBMB/MiniCPM/blob/main/LICENSE) License.
305
+
306
+ ## Citation
307
+ - Please cite our [paper](https://github.com/OpenBMB/MiniCPM/tree/main/report/MiniCPM_4_Technical_Report.pdf) if you find our work valuable.
308
+
309
+ ```bibtex
310
+ @article{minicpm4,
311
+ title={{MiniCPM4}: Ultra-Efficient LLMs on End Devices},
312
+ author={MiniCPM Team},
313
+ year={2025}
314
+ }
315
+ ```