ekurtic's picture
Upload folder using huggingface_hub
4197cf0 verified
|
raw
history blame
2.94 kB
metadata
language:
  - en
base_model:
  - mistralai/Devstral-Small-2507
pipeline_tag: text-generation
tags:
  - mistral
  - neuralmagic
  - redhat
  - llmcompressor
  - quantized
  - INT8
  - compressed-tensors
license: mit
license_name: mit
name: RedHatAI/Devstral-Small-2507
description: >-
  This model was obtained by quantizing weights and activations of
  Devstral-Small-2507 to INT8 data type.
readme: >-
  https://huggingface.co/RedHatAI/Devstral-Small-2507-quantized.w8a8/main/README.md
tasks:
  - text-to-text
provider: mistralai

Devstral-Small-2507-quantized.w8a8

Model Overview

  • Model Architecture: MistralForCausalLM
    • Input: Text
    • Output: Text
  • Model Optimizations:
    • Activation quantization: INT8
    • Weight quantization: INT8
  • Release Date: 08/29/2025
  • Version: 1.0
  • Model Developers: Red Hat (Neural Magic)

Model Optimizations

This model was obtained by quantizing weights and activations of Devstral-Small-2507 to INT8 data type. This optimization reduces the number of bits used to represent weights and activations from 16 to 8, reducing GPU memory requirements (by approximately 50%). Weight quantization also reduces disk size requirements by approximately 50%.

Deployment

This model can be deployed efficiently using the vLLM backend, as shown in the example below.

vllm serve RedHatAI/Devstral-Small-2507-quantized.w8a8 --tensor-parallel-size 1 --tokenizer_mode mistral

Evaluation

The model was evaluated on popular coding tasks (HumanEval, HumanEval+, MBPP, MBPP+) via EvalPlus and vllm backend (v0.10.1.1). For evaluations, we run greedy sampling and report pass@1. The command to reproduce evals:

evalplus.evaluate --model "RedHatAI/Devstral-Small-2507-quantized.w8a8" \
                  --dataset [humaneval|mbpp] \
                  --base-url http://localhost:8000/v1 \
                  --backend openai --greedy

Accuracy

Recovery (%) mistralai/Devstral-Small-2507 RedHatAI/Devstral-Small-2507-quantized.w8a8
(this model)
HumanEval 100.67 89.0 89.6
HumanEval+ 101.48 81.1 82.3
MBPP 98.71 77.5 76.5
MBPP+ 102.42 66.1 67.7
Average Score 100.77 78.43 79.03