ELVISIO commited on
Commit
4d81405
·
verified ·
1 Parent(s): 5c11af4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +5 -5
README.md CHANGED
@@ -13,13 +13,13 @@ language:
13
  - th
14
  pipeline_tag: text-generation
15
  license: apache-2.0
16
- base_model: Qwen/Qwen3-32B
17
  ---
18
 
19
  # Qwen3-32B-NVFP4A16
20
 
21
  ## Model Overview
22
- - **Model Architecture:** Qwen/Qwen3-32B
23
  - **Input:** Text
24
  - **Output:** Text
25
  - **Model Optimizations:**
@@ -28,14 +28,14 @@ base_model: Qwen/Qwen3-32B
28
  - **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English.
29
  - **Release Date:** 6/25/2025
30
  - **Version:** 10
31
- - **Model Developers:** RedHatAI
32
 
33
- This model is a quantized version of [Qwen/Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B).
34
  It was evaluated on a several tasks to assess the its quality in comparison to the unquatized model.
35
 
36
  ### Model Optimizations
37
 
38
- This model was obtained by quantizing the weights of [Qwen/Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B) to FP4 data type, ready for inference with vLLM>=9.1
39
  This optimization reduces the number of bits per parameter from 16 to 4, reducing the disk size and GPU memory requirements by approximately 25%.
40
 
41
  Only the weights of the linear operators within transformers blocks are quantized using [LLM Compressor](https://github.com/vllm-project/llm-compressor).
 
13
  - th
14
  pipeline_tag: text-generation
15
  license: apache-2.0
16
+ base_model: Qwen/Qwen3-8B
17
  ---
18
 
19
  # Qwen3-32B-NVFP4A16
20
 
21
  ## Model Overview
22
+ - **Model Architecture:** Qwen/Qwen3-8B
23
  - **Input:** Text
24
  - **Output:** Text
25
  - **Model Optimizations:**
 
28
  - **Out-of-scope:** Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in languages other than English.
29
  - **Release Date:** 6/25/2025
30
  - **Version:** 10
31
+ - **Model Developers:** ELVISIO (Thanks to RedHatAI)
32
 
33
+ This model is a quantized version of [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B).
34
  It was evaluated on a several tasks to assess the its quality in comparison to the unquatized model.
35
 
36
  ### Model Optimizations
37
 
38
+ This model was obtained by quantizing the weights of [Qwen/Qwen3-8B](https://huggingface.co/Qwen/Qwen3-8B) to FP4 data type, ready for inference with vLLM>=9.1
39
  This optimization reduces the number of bits per parameter from 16 to 4, reducing the disk size and GPU memory requirements by approximately 25%.
40
 
41
  Only the weights of the linear operators within transformers blocks are quantized using [LLM Compressor](https://github.com/vllm-project/llm-compressor).