teowu commited on
Commit
5012bd6
·
verified ·
1 Parent(s): b39889c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -1
README.md CHANGED
@@ -63,7 +63,6 @@ The model adopts an MoE language model, a native-resolution visual encoder (Moon
63
  > - For **Thinking models**, it is recommended to use `Temperature = 0.6`.
64
  > - For **Instruct models**, it is recommended to use `Temperature = 0.2`.
65
 
66
-
67
  ## Performance
68
 
69
  As an efficient model, Kimi-VL can robustly handle diverse tasks (fine-grained perception, math, college-level problems, OCR, agent, etc) across a broad spectrum of input forms (single-image, multi-image, video, long-document, etc).
@@ -132,6 +131,10 @@ Full comparison (GPT-4o included for reference):
132
 
133
  ### Inference with 🤗 Hugging Face Transformers
134
 
 
 
 
 
135
  We introduce how to use our model at inference stage using transformers library. It is recommended to use python=3.10, torch>=2.1.0, and transformers=4.48.2 as the development environment.
136
 
137
  ```python
 
63
  > - For **Thinking models**, it is recommended to use `Temperature = 0.6`.
64
  > - For **Instruct models**, it is recommended to use `Temperature = 0.2`.
65
 
 
66
  ## Performance
67
 
68
  As an efficient model, Kimi-VL can robustly handle diverse tasks (fine-grained perception, math, college-level problems, OCR, agent, etc) across a broad spectrum of input forms (single-image, multi-image, video, long-document, etc).
 
131
 
132
  ### Inference with 🤗 Hugging Face Transformers
133
 
134
+ > [!Note]
135
+ > Recommended prompt for OS agent tasks (Expected output is a point):
136
+ > - `Please observe the screenshot, please locate the following elements with action and point.<instruction> [YOUR INSTRUCTION]`
137
+
138
  We introduce how to use our model at inference stage using transformers library. It is recommended to use python=3.10, torch>=2.1.0, and transformers=4.48.2 as the development environment.
139
 
140
  ```python