Update README.md
Browse files
README.md
CHANGED
@@ -13,13 +13,13 @@ language:
|
|
13 |
- th
|
14 |
pipeline_tag: text-generation
|
15 |
license: apache-2.0
|
16 |
-
base_model: Qwen/Qwen3-
|
17 |
---
|
18 |
|
19 |
# Qwen3-32B-NVFP4A16
|
20 |
|
21 |
## Model Overview
|
22 |
-
- **Model Architecture:** Qwen/Qwen3-
|
23 |
- **Input:** Text
|
24 |
- **Output:** Text
|
25 |
- **Model Optimizations:**
|
@@ -28,14 +28,14 @@ base_model: Qwen/Qwen3-32B
|
|
28 |
- **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English.
|
29 |
- **Release Date:** 6/25/2025
|
30 |
- **Version:** 10
|
31 |
-
- **Model Developers:** RedHatAI
|
32 |
|
33 |
-
This model is a quantized version of [Qwen/Qwen3-
|
34 |
It was evaluated on a several tasks to assess the its quality in comparison to the unquatized model.
|
35 |
|
36 |
### Model Optimizations
|
37 |
|
38 |
-
This model was obtained by quantizing the weights of [Qwen/Qwen3-
|
39 |
This optimization reduces the number of bits per parameter from 16 to 4, reducing the disk size and GPU memory requirements by approximately 25%.
|
40 |
|
41 |
Only the weights of the linear operators within transformers blocks are quantized using [LLM Compressor](https://github.com/vllm-project/llm-compressor).
|
|
|
13 |
- th
|
14 |
pipeline_tag: text-generation
|
15 |
license: apache-2.0
|
16 |
+
base_model: Qwen/Qwen3-8B
|
17 |
---
|
18 |
|
19 |
# Qwen3-32B-NVFP4A16
|
20 |
|
21 |
## Model Overview
|
22 |
+
- **Model Architecture:** Qwen/Qwen3-8B
|
23 |
- **Input:** Text
|
24 |
- **Output:** Text
|
25 |
- **Model Optimizations:**
|
|
|
28 |
- **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English.
|
29 |
- **Release Date:** 6/25/2025
|
30 |
- **Version:** 10
|
31 |
+
- **Model Developers:** ELVISIO (Thanks to RedHatAI)
|
32 |
|
33 |
+
This model is a quantized version of [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B).
|
34 |
It was evaluated on a several tasks to assess the its quality in comparison to the unquatized model.
|
35 |
|
36 |
### Model Optimizations
|
37 |
|
38 |
+
This model was obtained by quantizing the weights of [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) to FP4 data type, ready for inference with vLLM>=9.1
|
39 |
This optimization reduces the number of bits per parameter from 16 to 4, reducing the disk size and GPU memory requirements by approximately 25%.
|
40 |
|
41 |
Only the weights of the linear operators within transformers blocks are quantized using [LLM Compressor](https://github.com/vllm-project/llm-compressor).
|