Update README.md
Browse files
README.md
CHANGED
@@ -63,7 +63,6 @@ The model adopts an MoE language model, a native-resolution visual encoder (Moon
|
|
63 |
> - For **Thinking models**, it is recommended to use `Temperature = 0.6`.
|
64 |
> - For **Instruct models**, it is recommended to use `Temperature = 0.2`.
|
65 |
|
66 |
-
|
67 |
## Performance
|
68 |
|
69 |
As an efficient model, Kimi-VL can robustly handle diverse tasks (fine-grained perception, math, college-level problems, OCR, agent, etc) across a broad spectrum of input forms (single-image, multi-image, video, long-document, etc).
|
@@ -132,6 +131,10 @@ Full comparison (GPT-4o included for reference):
|
|
132 |
|
133 |
### Inference with 🤗 Hugging Face Transformers
|
134 |
|
|
|
|
|
|
|
|
|
135 |
We introduce how to use our model at inference stage using transformers library. It is recommended to use python=3.10, torch>=2.1.0, and transformers=4.48.2 as the development environment.
|
136 |
|
137 |
```python
|
|
|
63 |
> - For **Thinking models**, it is recommended to use `Temperature = 0.6`.
|
64 |
> - For **Instruct models**, it is recommended to use `Temperature = 0.2`.
|
65 |
|
|
|
66 |
## Performance
|
67 |
|
68 |
As an efficient model, Kimi-VL can robustly handle diverse tasks (fine-grained perception, math, college-level problems, OCR, agent, etc) across a broad spectrum of input forms (single-image, multi-image, video, long-document, etc).
|
|
|
131 |
|
132 |
### Inference with 🤗 Hugging Face Transformers
|
133 |
|
134 |
+
> [!Note]
|
135 |
+
> Recommended prompt for OS agent tasks (Expected output is a point):
|
136 |
+
> - `Please observe the screenshot, please locate the following elements with action and point.<instruction> [YOUR INSTRUCTION]`
|
137 |
+
|
138 |
We introduce how to use our model at inference stage using transformers library. It is recommended to use python=3.10, torch>=2.1.0, and transformers=4.48.2 as the development environment.
|
139 |
|
140 |
```python
|