Files changed (1) hide show
  1. README.md +166 -1
README.md CHANGED
@@ -32,7 +32,14 @@ base_model:
32
  pipeline_tag: image-text-to-text
33
  ---
34
 
35
- # Model Card for Mistral-Small-3.1-24B-Instruct-2503
 
 
 
 
 
 
 
36
 
37
  Building upon Mistral Small 3 (2501), Mistral Small 3.1 (2503) **adds state-of-the-art vision understanding** and enhances **long context capabilities up to 128k tokens** without compromising text performance.
38
  With 24 billion parameters, this model achieves top-tier capabilities in both text and vision tasks.
@@ -53,6 +60,163 @@ For enterprises requiring specialized capabilities (increased context, specific
53
 
54
  Learn more about Mistral Small 3.1 in our [blog post](https://mistral.ai/news/mistral-small-3-1/).
55
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
56
  ## Key Features
57
  - **Vision:** Vision capabilities enable the model to analyze images and provide insights based on visual content in addition to text.
58
  - **Multilingual:** Supports dozens of languages, including English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farsi.
@@ -178,6 +342,7 @@ python -c "import mistral_common; print(mistral_common.__version__)"
178
 
179
  You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest/images/sha256-de9032a92ffea7b5c007dad80b38fd44aac11eddc31c435f8e52f3b7404bbf39).
180
 
 
181
  #### Server
182
 
183
  We recommand that you use Mistral-Small-3.1-24B-Instruct-2503 in a server/client setting.
 
32
  pipeline_tag: image-text-to-text
33
  ---
34
 
35
+ <h1 style="display: flex; align-items: center; gap: 10px; margin: 0;">
36
+ Mistral-Small-3.1-24B-Instruct-2503
37
+ <img src="https://www.redhat.com/rhdc/managed-files/Catalog-Validated_model_0.png" alt="Model Icon" width="40" style="margin: 0; padding: 0;" />
38
+ </h1>
39
+
40
+ <a href="https://www.redhat.com/en/products/ai/validated-models" target="_blank" style="margin: 0; padding: 0;">
41
+ <img src="https://www.redhat.com/rhdc/managed-files/Validated_badge-Dark.png" alt="Validated Badge" width="250" style="margin: 0; padding: 0;" />
42
+ </a>
43
 
44
  Building upon Mistral Small 3 (2501), Mistral Small 3.1 (2503) **adds state-of-the-art vision understanding** and enhances **long context capabilities up to 128k tokens** without compromising text performance.
45
  With 24 billion parameters, this model achieves top-tier capabilities in both text and vision tasks.
 
60
 
61
  Learn more about Mistral Small 3.1 in our [blog post](https://mistral.ai/news/mistral-small-3-1/).
62
 
63
+ <details>
64
+ <summary>Deploy on <strong>Red Hat AI Inference Server</strong></summary>
65
+
66
+ ```bash
67
+ $ podman run --rm -it --device nvidia.com/gpu=all -p 8000:8000 \
68
+ --ipc=host \
69
+ --env "HUGGING_FACE_HUB_TOKEN=$HF_TOKEN" \
70
+ --env "HF_HUB_OFFLINE=0" -v ~/.cache/vllm:/home/vllm/.cache \
71
+ --name=vllm \
72
+ registry.access.redhat.com/rhaiis/rh-vllm-cuda \
73
+ vllm serve \
74
+ --tensor-parallel-size 8 \
75
+ --max-model-len 32768 \
76
+ --enforce-eager --model RedHatAI/Mistral-Small-3.1-24B-Instruct-2503
77
+ ```
78
+
79
+ See [Red Hat AI Inference Server documentation](https://docs.redhat.com/en/documentation/red_hat_ai_inference_server/) for more details.
80
+ </details>
81
+
82
+ <details>
83
+ <summary>Deploy on <strong>Red Hat Enterprise Linux AI</strong></summary>
84
+
85
+ ```bash
86
+ # Download model from Red Hat Registry via docker
87
+ # Note: This downloads the model to ~/.cache/instructlab/models unless --model-dir is specified.
88
+ ilab model download --repository docker://registry.redhat.io/rhelai1/mistral-small-3-1-24b-instruct-2503:1.5
89
+ ```
90
+
91
+ ```bash
92
+ # Serve model via ilab
93
+ ilab model serve --model-path ~/.cache/instructlab/models/mistral-small-3-1-24b-instruct-2503
94
+
95
+ # Chat with model
96
+ ilab model chat --model ~/.cache/instructlab/models/mistral-small-3-1-24b-instruct-2503
97
+ ```
98
+ See [Red Hat Enterprise Linux AI documentation](https://docs.redhat.com/en/documentation/red_hat_enterprise_linux_ai/1.4) for more details.
99
+ </details>
100
+
101
+ <details>
102
+ <summary>Deploy on <strong>Red Hat Openshift AI</strong></summary>
103
+
104
+ ```python
105
+ # Setting up vllm server with ServingRuntime
106
+ # Save as: vllm-servingruntime.yaml
107
+ apiVersion: serving.kserve.io/v1alpha1
108
+ kind: ServingRuntime
109
+ metadata:
110
+ name: vllm-cuda-runtime # OPTIONAL CHANGE: set a unique name
111
+ annotations:
112
+ openshift.io/display-name: vLLM NVIDIA GPU ServingRuntime for KServe
113
+ opendatahub.io/recommended-accelerators: '["nvidia.com/gpu"]'
114
+ labels:
115
+ opendatahub.io/dashboard: 'true'
116
+ spec:
117
+ annotations:
118
+ prometheus.io/port: '8080'
119
+ prometheus.io/path: '/metrics'
120
+ multiModel: false
121
+ supportedModelFormats:
122
+ - autoSelect: true
123
+ name: vLLM
124
+ containers:
125
+ - name: kserve-container
126
+ image: quay.io/modh/vllm:rhoai-2.20-cuda # CHANGE if needed. If AMD: quay.io/modh/vllm:rhoai-2.20-rocm
127
+ command:
128
+ - python
129
+ - -m
130
+ - vllm.entrypoints.openai.api_server
131
+ args:
132
+ - "--port=8080"
133
+ - "--model=/mnt/models"
134
+ - "--served-model-name={{.Name}}"
135
+ env:
136
+ - name: HF_HOME
137
+ value: /tmp/hf_home
138
+ ports:
139
+ - containerPort: 8080
140
+ protocol: TCP
141
+ ```
142
+
143
+ ```python
144
+ # Attach model to vllm server. This is an NVIDIA template
145
+ # Save as: inferenceservice.yaml
146
+ apiVersion: serving.kserve.io/v1beta1
147
+ kind: InferenceService
148
+ metadata:
149
+ annotations:
150
+ openshift.io/display-name: mistral-small-3-1-24b-instruct-2503 # OPTIONAL CHANGE
151
+ serving.kserve.io/deploymentMode: RawDeployment
152
+ name: mistral-small-3-1-24b-instruct-2503 # specify model name. This value will be used to invoke the model in the payload
153
+ labels:
154
+ opendatahub.io/dashboard: 'true'
155
+ spec:
156
+ predictor:
157
+ maxReplicas: 1
158
+ minReplicas: 1
159
+ model:
160
+ modelFormat:
161
+ name: vLLM
162
+ name: ''
163
+ resources:
164
+ limits:
165
+ cpu: '2' # this is model specific
166
+ memory: 8Gi # this is model specific
167
+ nvidia.com/gpu: '1' # this is accelerator specific
168
+ requests: # same comment for this block
169
+ cpu: '1'
170
+ memory: 4Gi
171
+ nvidia.com/gpu: '1'
172
+ runtime: vllm-cuda-runtime # must match the ServingRuntime name above
173
+ storageUri: oci://registry.redhat.io/rhelai1/modelcar-mistral-small-3-1-24b-instruct-2503:1.5
174
+ tolerations:
175
+ - effect: NoSchedule
176
+ key: nvidia.com/gpu
177
+ operator: Exists
178
+ ```
179
+
180
+ ```bash
181
+ # make sure first to be in the project where you want to deploy the model
182
+ # oc project <project-name>
183
+
184
+ # apply both resources to run model
185
+
186
+ # Apply the ServingRuntime
187
+ oc apply -f vllm-servingruntime.yaml
188
+
189
+ # Apply the InferenceService
190
+ oc apply -f qwen-inferenceservice.yaml
191
+ ```
192
+
193
+ ```python
194
+ # Replace <inference-service-name> and <cluster-ingress-domain> below:
195
+ # - Run `oc get inferenceservice` to find your URL if unsure.
196
+
197
+ # Call the server using curl:
198
+ curl https://<inference-service-name>-predictor-default.<domain>/v1/chat/completions
199
+ -H "Content-Type: application/json" \
200
+ -d '{
201
+ "model": "mistral-small-3-1-24b-instruct-2503",
202
+ "stream": true,
203
+ "stream_options": {
204
+ "include_usage": true
205
+ },
206
+ "max_tokens": 1,
207
+ "messages": [
208
+ {
209
+ "role": "user",
210
+ "content": "How can a bee fly when its wings are so small?"
211
+ }
212
+ ]
213
+ }'
214
+
215
+ ```
216
+
217
+ See [Red Hat Openshift AI documentation](https://docs.redhat.com/en/documentation/red_hat_openshift_ai/2025) for more details.
218
+ </details>
219
+
220
  ## Key Features
221
  - **Vision:** Vision capabilities enable the model to analyze images and provide insights based on visual content in addition to text.
222
  - **Multilingual:** Supports dozens of languages, including English, French, German, Greek, Hindi, Indonesian, Italian, Japanese, Korean, Malay, Nepali, Polish, Portuguese, Romanian, Russian, Serbian, Spanish, Swedish, Turkish, Ukrainian, Vietnamese, Arabic, Bengali, Chinese, Farsi.
 
342
 
343
  You can also make use of a ready-to-go [docker image](https://github.com/vllm-project/vllm/blob/main/Dockerfile) or on the [docker hub](https://hub.docker.com/layers/vllm/vllm-openai/latest/images/sha256-de9032a92ffea7b5c007dad80b38fd44aac11eddc31c435f8e52f3b7404bbf39).
344
 
345
+
346
  #### Server
347
 
348
  We recommand that you use Mistral-Small-3.1-24B-Instruct-2503 in a server/client setting.