lsw825 commited on
Commit
b51274b
·
verified ·
1 Parent(s): 8a112f5

Upload folder using huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +50 -61
README.md CHANGED
@@ -1,17 +1,11 @@
1
- <!-- ---
2
- library_name: transformers
3
- --- -->
4
- <!-- markdownlint-disable first-line-h1 -->
5
- <!-- markdownlint-disable html -->
6
- <!-- markdownlint-disable no-duplicate-header -->
7
-
8
-
9
  <div align="center">
10
  <picture>
11
  <img src="figures/kimi-logo.png" width="30%" alt="Kimi K2: Open Agentic Intellignece">
12
  </picture>
13
  </div>
 
14
  <hr>
 
15
  <div align="center" style="line-height:1">
16
  <a href="https://www.kimi.com" target="_blank"><img alt="Chat" src="https://img.shields.io/badge/🤖%20Chat-Kimi%20K2-ff6b6b?color=1783ff&logoColor=white"/></a>
17
  <a href="https://www.moonshot.ai" target="_blank"><img alt="Homepage" src="https://img.shields.io/badge/Homepage-Moonshot%20AI-white?logo=Kimi&logoColor=white"/></a>
@@ -31,8 +25,6 @@ library_name: transformers
31
  <b>📰&nbsp;&nbsp;<a href="https://moonshotai.github.io/Kimi-K2/">Tech Blog</a></b> &nbsp;&nbsp;&nbsp; | &nbsp;&nbsp;&nbsp; <b>📄&nbsp;&nbsp;Paper Link (comming soon)</b>
32
  </p>
33
 
34
- ## 0. Reminder: Remove this after you squash the commit history before release.
35
-
36
  ## 1. Model Introduction
37
 
38
  Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 billion activated parameters and 1 trillion total parameters. Trained with the Muon optimizer, Kimi K2 achieves exceptional performance across frontier knowledge, reasoning, and coding tasks while being meticulously optimized for agentic capabilities.
@@ -47,11 +39,6 @@ Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 bi
47
  - **Kimi-K2-Instruct**: The post-trained model best for drop-in, general-purpose chat and agentic experiences. It is a reflex-grade model without long thinking.
48
 
49
 
50
- <p align="center">
51
- TODO this is a banner
52
- <img width="80%" src="figures/logo.svg">
53
- </p>
54
-
55
  ## 2. Model Summary
56
 
57
  <div align="center">
@@ -86,13 +73,13 @@ Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 bi
86
  <tr>
87
  <th align="center">Benchmark</th>
88
  <th align="center">Metric</th>
89
- <th align="center">Kimi K2 Instruct</th>
90
- <th align="center">DeepSeek-V3-0324</th>
91
- <th align="center">Qwen3-235B-A22B <br><sup>(non-thinking)</sup></th>
92
- <th align="center">Claude Sonnet 4 <br><sup>(w/o extended thinking)</sup></th>
93
- <th align="center">Claude Opus 4 <br><sup>(w/o extended thinking)</sup></th>
94
- <th align="center">GPT-4.1</th>
95
- <th align="center">Gemini 2.5 Flash <br> Preview (05-20)</th>
96
  </tr>
97
  </thead>
98
  <tbody>
@@ -106,7 +93,7 @@ Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 bi
106
  <td align="center">46.9</td>
107
  <td align="center">37.0</td>
108
  <td align="center">48.5</td>
109
- <td align="center">47.4</t6
110
  <td align="center">44.7</td>
111
  <td align="center">44.7</td>
112
  </tr>
@@ -121,10 +108,11 @@ Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 bi
121
  <td align="center">19.5</td>
122
  <td align="center">19.5</td>
123
  </tr>
 
124
  <tr>
125
  <td align="center">MultiPL-E</td>
126
  <td align="center">Pass@1</td>
127
- <td align="center"><ins><strong>86.7</strong></ins></td>
128
  <td align="center">83.1</td>
129
  <td align="center">78.2</td>
130
  <td align="center">88.6</td>
@@ -132,6 +120,7 @@ Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 bi
132
  <td align="center">86.7</td>
133
  <td align="center">85.6</td>
134
  </tr>
 
135
  <tr>
136
  <td align="center">SWE-bench Verified <br/><sup>(Agentless Coding)</sup></td>
137
  <td align="center">Single Patch</td>
@@ -143,18 +132,19 @@ Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 bi
143
  <td align="center">40.8</td>
144
  <td align="center">32.6</td>
145
  </tr>
146
-
147
  <tr>
148
  <td align="center" rowspan="2">SWE-bench Verified <br/> <sup>(Agentic Coding)</sup></td>
149
  <td align="center">Single Attempt (Acc)</td>
150
  <td align="center"><ins><strong>65.8</strong></ins></td>
151
  <td align="center">38.8</td>
152
  <td align="center">34.4</td>
153
- <td align="center"><strong>72.7</strong></td>
154
  <td align="center">72.5<sup>*</sup></td>
155
  <td align="center">54.6</td>
156
  <td align="center">—</td>
157
  </tr>
 
158
  <tr>
159
  <!--<td align="center">(Agentic Coding)</td>-->
160
  <td align="center">Multiple Attempts (Acc)</td>
@@ -168,7 +158,7 @@ Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 bi
168
  </tr>
169
 
170
  <tr>
171
- <td align="center" rowspan="2">SWE-bench Multilingual<br /> <sup>(Agentic Coding)</sup></td>
172
  <td align="center">Single Attempt (Acc)</td>
173
  <td align="center"><ins><strong>47.3</strong> </ins></td>
174
  <td align="center">25.8</td>
@@ -178,23 +168,25 @@ Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 bi
178
  <td align="center">31.5</td>
179
  <td align="center">—</td>
180
  </tr>
 
181
  <tr>
182
- <!--<td align="center">(Agentic Coding)</td>-->
183
  <td align="center">Inhouse Framework (Acc)</td>
184
- <td align="center"><ins><strong>30.0</strong> </ins></td>
185
  <td align="center">—</td>
186
  <td align="center">—</td>
187
  <td align="center">35.5</td>
188
  <td align="center"><strong>43.2</strong></td>
189
- <td align="center">8.30</td>
190
  <td align="center">—</td>
191
  </tr>
 
192
  <tr>
193
- <td align="center">TerminalBench</td>
194
  <td align="center">Acc</td>
195
  <td align="center"><ins><strong>25.0</strong> </ins></td>
196
  <td align="center">16.3</td>
197
- <td align="center">6.60</td>
198
  <td align="center">—</td>
199
  <td align="center">—</td>
200
  <td align="center"><strong>30.3</strong></td>
@@ -291,7 +283,7 @@ Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 bi
291
  <td align="center">91.2<sup>*</sup></td>
292
  <td align="center">94.0</td>
293
  <td align="center">94.4</td>
294
- <td align="center">92.4/td>
295
  <td align="center">95.4</td>
296
  </tr>
297
  <tr>
@@ -301,7 +293,7 @@ Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 bi
301
  <td align="center">27.5</td>
302
  <td align="center">11.9</td>
303
  <td align="center">15.9</td>
304
- <td align="center">15.8</td>
305
  <td align="center">19.4</td>
306
  <td align="center">34.7</td>
307
  </tr>
@@ -333,7 +325,7 @@ Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 bi
333
  <td align="center">Acc</td>
334
  <td align="center"><strong>89.0</strong></td>
335
  <td align="center">84.0</td>
336
- <td align="center">37.7</td>
337
  <td align="center">73.7</td>
338
  <td align="center">59.3</td>
339
  <td align="center">58.5</td>
@@ -377,8 +369,8 @@ Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 bi
377
  </tr>
378
 
379
  <tr>
380
- <td align="center">Humanity’s Last</td>
381
- <td align="center">(Text Only)</td>
382
  <td align="center">4.7</td>
383
  <td align="center">5.2</td>
384
  <td align="center"><ins><strong>5.7</strong></ins></td>
@@ -491,7 +483,7 @@ Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 bi
491
  </sup><br/><sup>
492
  • Some data points have been omitted due to prohibitively expensive evaluation costs.
493
  </sup>
494
-
495
  ---
496
 
497
  #### Base model evaluation results
@@ -501,22 +493,22 @@ Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 bi
501
  | Benchmark | Metric | Shot | Kimi K2 Base | Deepseek-V3-Base | Qwen2.5-72B | Llama 4 Maverick |
502
  |:-------------------:|:----------:|:---------:|:--------------:|:------------------:|:-------------:|:------------------:|
503
  | **General Tasks** | | | | | | |
504
- | MMLU | EM | 5-shot | **87.79** | 87.1 | 86.08 | 84.87 |
505
- | MMLU-pro | EM | 5-shot | **69.17** | 60.59 | 62.8 | 63.47 |
506
- | MMLU-redux-2.0 | EM | 5-shot | **90.17** | 89.53 | 87.77 | 88.18 |
507
- | SimpleQA | Correct | 5-shot | **35.25** | 26.49 | 10.31 | 23.74 |
508
- | TriviaQA | EM | 5-shot | **85.09** | 84.11 | 76.03 | 79.25 |
509
- | GPQA-Diamond | Avg@8 | 5-shot | 48.11 | **50.51** | 40.78 | 49.43 |
510
- | SuperGPQA | EM | 5-shot | **44.67** | 39.2 | 34.23 | 38.84 |
511
  | **Code Tasks** | | | | | | |
512
- | LiveCodeBench v6 | Pass@1 | 1-shot | **26.29** | 22.86 | 21.14 | 25.14 |
513
- | EvalPlus | Pass@1 | - | **80.33** | 65.61 | 66.04 | 65.48 |
514
  | **Mathematics Tasks** | | | | | | |
515
- | MATH | EM | 4-shot | **70.22** | 60.06 | 60.96 | 63.02 |
516
- | GSM8k | EM | 8-shot | **92.12** | 91.66 | 90.37 | 86.35 |
517
  | **Chinese Tasks** | | | | | | |
518
- | C-Eval | EM | 5-shot | **92.5** | 90.04 | 90.86 | 80.91 |
519
- | CSimpleQA | Correct | 5-shot | **77.57** | 72.13 | 50.53 | 53.47 |
520
 
521
  </div>
522
  <sup>
@@ -537,12 +529,12 @@ Our model checkpoints are stored in the block-fp8 format, you can find it on [Hu
537
 
538
  Currently, Kimi-K2 is recommended to run on the following inference engines:
539
 
540
- * vLLM
541
  * SGLang
542
  * KTransformers
543
- * TensorRT-LLM
544
 
545
- Deployment examples for vLLM and SGLang can be found in the [Model Deployment Guide](docs/deploy_guidance.md).
546
 
547
  ---
548
 
@@ -568,7 +560,7 @@ def simple_chat(client: OpenAI, model_name: str):
568
  print(response.choices[0].message.content)
569
  ```
570
 
571
- > [!NOTE]
572
  > The recommended temperature for Kimi-K2-Instruct is `temperature = 0.6`.
573
  > If no special instructions are required, the system prompt above is a good default.
574
 
@@ -576,7 +568,7 @@ def simple_chat(client: OpenAI, model_name: str):
576
 
577
  ### Tool Calling
578
 
579
- Kimi-K2-Instruct has strong tool-calling capabilities.
580
  To enable them, you need to pass the list of available tools in each request, then the model will autonomously decide when and how to invoke them.
581
 
582
  The following example demonstrates calling a weather tool end-to-end:
@@ -645,8 +637,8 @@ def tool_call_with_client(client: OpenAI, model_name: str):
645
  print(choice.message.content)
646
  ```
647
 
648
- The `tool_call_with_client` function implements the pipeline from user query to tool execution.
649
- This pipeline requires the inference engine to support Kimi-K2’s native tool-parsing logic.
650
  For streaming output and manual tool-parsing, see the [Tool Calling Guide](docs/tool_call_guidance.md).
651
 
652
  ---
@@ -655,9 +647,6 @@ For streaming output and manual tool-parsing, see the [Tool Calling Guide](docs/
655
 
656
  Both the code repository and the model weights are released under the [Modified MIT License](LICENSE).
657
 
658
- In short, it is MIT License for most people, but you need to give credit to "Kimi K2" by displaying it prominently in your product, if you have more than 100 million monthly active users or annual revenue exceeding 20 million USD.
659
-
660
-
661
  ---
662
 
663
  ## 7. Contact Us
 
 
 
 
 
 
 
 
 
1
  <div align="center">
2
  <picture>
3
  <img src="figures/kimi-logo.png" width="30%" alt="Kimi K2: Open Agentic Intellignece">
4
  </picture>
5
  </div>
6
+
7
  <hr>
8
+
9
  <div align="center" style="line-height:1">
10
  <a href="https://www.kimi.com" target="_blank"><img alt="Chat" src="https://img.shields.io/badge/🤖%20Chat-Kimi%20K2-ff6b6b?color=1783ff&logoColor=white"/></a>
11
  <a href="https://www.moonshot.ai" target="_blank"><img alt="Homepage" src="https://img.shields.io/badge/Homepage-Moonshot%20AI-white?logo=Kimi&logoColor=white"/></a>
 
25
  <b>📰&nbsp;&nbsp;<a href="https://moonshotai.github.io/Kimi-K2/">Tech Blog</a></b> &nbsp;&nbsp;&nbsp; | &nbsp;&nbsp;&nbsp; <b>📄&nbsp;&nbsp;Paper Link (comming soon)</b>
26
  </p>
27
 
 
 
28
  ## 1. Model Introduction
29
 
30
  Kimi K2 is a state-of-the-art mixture-of-experts (MoE) language model with 32 billion activated parameters and 1 trillion total parameters. Trained with the Muon optimizer, Kimi K2 achieves exceptional performance across frontier knowledge, reasoning, and coding tasks while being meticulously optimized for agentic capabilities.
 
39
  - **Kimi-K2-Instruct**: The post-trained model best for drop-in, general-purpose chat and agentic experiences. It is a reflex-grade model without long thinking.
40
 
41
 
 
 
 
 
 
42
  ## 2. Model Summary
43
 
44
  <div align="center">
 
73
  <tr>
74
  <th align="center">Benchmark</th>
75
  <th align="center">Metric</th>
76
+ <th align="center"><sup>Kimi K2 Instruct</sup></th>
77
+ <th align="center"><sup>DeepSeek-V3-0324</sup></th>
78
+ <th align="center"><sup>Qwen3-235B-A22B <br><sup>(non-thinking)</sup></sup></th>
79
+ <th align="center"><sup>Claude Sonnet 4 <br><sup>(w/o extended thinking)</sup></sup></th>
80
+ <th align="center"><sup>Claude Opus 4 <br><sup>(w/o extended thinking)</sup></sup></th>
81
+ <th align="center"><sup>GPT-4.1</sup></th>
82
+ <th align="center"><sup>Gemini 2.5 Flash <br> Preview (05-20)</sup></th>
83
  </tr>
84
  </thead>
85
  <tbody>
 
93
  <td align="center">46.9</td>
94
  <td align="center">37.0</td>
95
  <td align="center">48.5</td>
96
+ <td align="center">47.4</td>
97
  <td align="center">44.7</td>
98
  <td align="center">44.7</td>
99
  </tr>
 
108
  <td align="center">19.5</td>
109
  <td align="center">19.5</td>
110
  </tr>
111
+
112
  <tr>
113
  <td align="center">MultiPL-E</td>
114
  <td align="center">Pass@1</td>
115
+ <td align="center"><ins><strong>85.7</strong></ins></td>
116
  <td align="center">83.1</td>
117
  <td align="center">78.2</td>
118
  <td align="center">88.6</td>
 
120
  <td align="center">86.7</td>
121
  <td align="center">85.6</td>
122
  </tr>
123
+
124
  <tr>
125
  <td align="center">SWE-bench Verified <br/><sup>(Agentless Coding)</sup></td>
126
  <td align="center">Single Patch</td>
 
132
  <td align="center">40.8</td>
133
  <td align="center">32.6</td>
134
  </tr>
135
+
136
  <tr>
137
  <td align="center" rowspan="2">SWE-bench Verified <br/> <sup>(Agentic Coding)</sup></td>
138
  <td align="center">Single Attempt (Acc)</td>
139
  <td align="center"><ins><strong>65.8</strong></ins></td>
140
  <td align="center">38.8</td>
141
  <td align="center">34.4</td>
142
+ <td align="center"><strong>72.7</strong><sup>*</sup></td>
143
  <td align="center">72.5<sup>*</sup></td>
144
  <td align="center">54.6</td>
145
  <td align="center">—</td>
146
  </tr>
147
+
148
  <tr>
149
  <!--<td align="center">(Agentic Coding)</td>-->
150
  <td align="center">Multiple Attempts (Acc)</td>
 
158
  </tr>
159
 
160
  <tr>
161
+ <td align="center">SWE-bench Multilingual<br /> <sup>(Agentic Coding)</sup></td>
162
  <td align="center">Single Attempt (Acc)</td>
163
  <td align="center"><ins><strong>47.3</strong> </ins></td>
164
  <td align="center">25.8</td>
 
168
  <td align="center">31.5</td>
169
  <td align="center">—</td>
170
  </tr>
171
+
172
  <tr>
173
+ <td align="center" rowspan="2">TerminalBench</td>
174
  <td align="center">Inhouse Framework (Acc)</td>
175
+ <td align="center"><ins><strong>30.0</strong></ins></td>
176
  <td align="center">—</td>
177
  <td align="center">—</td>
178
  <td align="center">35.5</td>
179
  <td align="center"><strong>43.2</strong></td>
180
+ <td align="center">8.3</td>
181
  <td align="center">—</td>
182
  </tr>
183
+
184
  <tr>
185
+ <!--<td align="center">TerminalBench</td>-->
186
  <td align="center">Acc</td>
187
  <td align="center"><ins><strong>25.0</strong> </ins></td>
188
  <td align="center">16.3</td>
189
+ <td align="center">6.6</td>
190
  <td align="center">—</td>
191
  <td align="center">—</td>
192
  <td align="center"><strong>30.3</strong></td>
 
283
  <td align="center">91.2<sup>*</sup></td>
284
  <td align="center">94.0</td>
285
  <td align="center">94.4</td>
286
+ <td align="center">92.4</td>
287
  <td align="center">95.4</td>
288
  </tr>
289
  <tr>
 
293
  <td align="center">27.5</td>
294
  <td align="center">11.9</td>
295
  <td align="center">15.9</td>
296
+ <td align="center">15.9</td>
297
  <td align="center">19.4</td>
298
  <td align="center">34.7</td>
299
  </tr>
 
325
  <td align="center">Acc</td>
326
  <td align="center"><strong>89.0</strong></td>
327
  <td align="center">84.0</td>
328
+ <td align="center">37.7<sup>*</sup></td>
329
  <td align="center">73.7</td>
330
  <td align="center">59.3</td>
331
  <td align="center">58.5</td>
 
369
  </tr>
370
 
371
  <tr>
372
+ <td align="center">Humanity's Last Exam<br><sup>(Text Only)</sup></td>
373
+ <td align="center">-</td>
374
  <td align="center">4.7</td>
375
  <td align="center">5.2</td>
376
  <td align="center"><ins><strong>5.7</strong></ins></td>
 
483
  </sup><br/><sup>
484
  • Some data points have been omitted due to prohibitively expensive evaluation costs.
485
  </sup>
486
+
487
  ---
488
 
489
  #### Base model evaluation results
 
493
  | Benchmark | Metric | Shot | Kimi K2 Base | Deepseek-V3-Base | Qwen2.5-72B | Llama 4 Maverick |
494
  |:-------------------:|:----------:|:---------:|:--------------:|:------------------:|:-------------:|:------------------:|
495
  | **General Tasks** | | | | | | |
496
+ | MMLU | EM | 5-shot | **87.8** | 87.1 | 86.1 | 84.9 |
497
+ | MMLU-pro | EM | 5-shot | **69.2** | 60.6 | 62.8 | 63.5 |
498
+ | MMLU-redux-2.0 | EM | 5-shot | **90.2** | 89.5 | 87.8 | 88.2 |
499
+ | SimpleQA | Correct | 5-shot | **35.3** | 26.5 | 10.3 | 23.7 |
500
+ | TriviaQA | EM | 5-shot | **85.1** | 84.1 | 76.0 | 79.3 |
501
+ | GPQA-Diamond | Avg@8 | 5-shot | 48.1 | **50.5** | 40.8 | 49.4 |
502
+ | SuperGPQA | EM | 5-shot | **44.7** | 39.2 | 34.2 | 38.8 |
503
  | **Code Tasks** | | | | | | |
504
+ | LiveCodeBench v6 | Pass@1 | 1-shot | **26.3** | 22.9 | 21.1 | 25.1 |
505
+ | EvalPlus | Pass@1 | - | **80.3** | 65.6 | 66.0 | 65.5 |
506
  | **Mathematics Tasks** | | | | | | |
507
+ | MATH | EM | 4-shot | **70.2** | 60.1 | 61.0 | 63.0 |
508
+ | GSM8k | EM | 8-shot | **92.1** | 91.7 | 90.4 | 86.3 |
509
  | **Chinese Tasks** | | | | | | |
510
+ | C-Eval | EM | 5-shot | **92.5** | 90.0 | 90.9 | 80.9 |
511
+ | CSimpleQA | Correct | 5-shot | **77.6** | 72.1 | 50.5 | 53.5 |
512
 
513
  </div>
514
  <sup>
 
529
 
530
  Currently, Kimi-K2 is recommended to run on the following inference engines:
531
 
532
+ * vLLM
533
  * SGLang
534
  * KTransformers
535
+ * TensorRT-LLM
536
 
537
+ Deployment examples for vLLM and SGLang can be found in the [Model Deployment Guide](docs/deploy_guidance.md).
538
 
539
  ---
540
 
 
560
  print(response.choices[0].message.content)
561
  ```
562
 
563
+ > [!NOTE]
564
  > The recommended temperature for Kimi-K2-Instruct is `temperature = 0.6`.
565
  > If no special instructions are required, the system prompt above is a good default.
566
 
 
568
 
569
  ### Tool Calling
570
 
571
+ Kimi-K2-Instruct has strong tool-calling capabilities.
572
  To enable them, you need to pass the list of available tools in each request, then the model will autonomously decide when and how to invoke them.
573
 
574
  The following example demonstrates calling a weather tool end-to-end:
 
637
  print(choice.message.content)
638
  ```
639
 
640
+ The `tool_call_with_client` function implements the pipeline from user query to tool execution.
641
+ This pipeline requires the inference engine to support Kimi-K2’s native tool-parsing logic.
642
  For streaming output and manual tool-parsing, see the [Tool Calling Guide](docs/tool_call_guidance.md).
643
 
644
  ---
 
647
 
648
  Both the code repository and the model weights are released under the [Modified MIT License](LICENSE).
649
 
 
 
 
650
  ---
651
 
652
  ## 7. Contact Us