Update README.md
Browse files
README.md
CHANGED
@@ -1,265 +0,0 @@
|
|
1 |
-
arcee-ai/SuperNova-Medius-CM-w4a16
|
2 |
-
|
3 |
-
Model Card
|
4 |
-
|
5 |
-
Model Name: SuperNova Medius Compressed Model (W4A16)
|
6 |
-
|
7 |
-
Model ID: arcee-ai/SuperNova-Medius-CM-w4a16
|
8 |
-
|
9 |
-
Overview
|
10 |
-
|
11 |
-
The SuperNova Medius CM W4A16 is a quantized version of the arcee-ai/SuperNova-Medius model. This compressed model leverages GPTQ (Generalized Post-Training Quantization) to reduce the model size and accelerate inference while maintaining performance close to the original model. The quantization scheme used is Weight 4-bit, Activation 16-bit (W4A16).
|
12 |
-
|
13 |
-
Model Details
|
14 |
-
|
15 |
-
• Base Model: arcee-ai/SuperNova-Medius
|
16 |
-
• Quantization Method: GPTQ (Generalized Post-Training Quantization)
|
17 |
-
• Quantization Parameters:
|
18 |
-
• Targets: Linear layers
|
19 |
-
• Scheme: W4A16 (Weights quantized to 4 bits, activations to 16 bits)
|
20 |
-
• Ignored Layers: lm_head
|
21 |
-
• Dampening Fraction: 0.1
|
22 |
-
• Calibration Dataset: neuralmagic/LLM_compression_calibration
|
23 |
-
• Number of Calibration Samples: 1024
|
24 |
-
• Maximum Sequence Length: 4096
|
25 |
-
• Random Seed: 42
|
26 |
-
|
27 |
-
Intended Use
|
28 |
-
|
29 |
-
This model is designed for developers and researchers who need a smaller and faster version of the SuperNova-Medius model for inference tasks, especially in environments with limited computational resources.
|
30 |
-
|
31 |
-
How to Use
|
32 |
-
|
33 |
-
from transformers import AutoTokenizer, AutoModelForCausalLM
|
34 |
-
|
35 |
-
tokenizer = AutoTokenizer.from_pretrained("arcee-ai/SuperNova-Medius-CM-w4a16")
|
36 |
-
model = AutoModelForCausalLM.from_pretrained("arcee-ai/SuperNova-Medius-CM-w4a16")
|
37 |
-
|
38 |
-
input_text = "Hello, how are you?"
|
39 |
-
input_ids = tokenizer.encode(input_text, return_tensors='pt')
|
40 |
-
|
41 |
-
output = model.generate(input_ids)
|
42 |
-
print(tokenizer.decode(output[0], skip_special_tokens=True))
|
43 |
-
|
44 |
-
Quantization Details
|
45 |
-
|
46 |
-
The quantization process was executed using the following script:
|
47 |
-
|
48 |
-
# Quantization Script
|
49 |
-
|
50 |
-
import torch
|
51 |
-
from datasets import load_dataset
|
52 |
-
from transformers import AutoTokenizer
|
53 |
-
from llmcompressor.modifiers.quantization import GPTQModifier
|
54 |
-
from llmcompressor.transformers import SparseAutoModelForCausalLM, oneshot
|
55 |
-
from llmcompressor.transformers.compression.helpers import calculate_offload_device_map
|
56 |
-
|
57 |
-
# Parameters
|
58 |
-
MODEL_ID = "./arcee-ai/SuperNova-Medius"
|
59 |
-
NUM_CALIBRATION_SAMPLES = 1024
|
60 |
-
MAX_SEQUENCE_LENGTH = 4096
|
61 |
-
SEED = 42
|
62 |
-
|
63 |
-
# Device Map Calculation
|
64 |
-
device_map = calculate_offload_device_map(
|
65 |
-
MODEL_ID,
|
66 |
-
num_gpus=torch.cuda.device_count(),
|
67 |
-
reserve_for_hessians=True,
|
68 |
-
torch_dtype=torch.bfloat16
|
69 |
-
)
|
70 |
-
|
71 |
-
# Load Model and Tokenizer
|
72 |
-
model = SparseAutoModelForCausalLM.from_pretrained(
|
73 |
-
MODEL_ID,
|
74 |
-
device_map=device_map,
|
75 |
-
torch_dtype=torch.bfloat16
|
76 |
-
)
|
77 |
-
tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
|
78 |
-
|
79 |
-
# Load and Preprocess Calibration Dataset
|
80 |
-
DATASET_ID = "neuralmagic/LLM_compression_calibration"
|
81 |
-
ds = load_dataset(DATASET_ID)
|
82 |
-
ds = ds["train"].shuffle(seed=SEED).select(range(NUM_CALIBRATION_SAMPLES))
|
83 |
-
|
84 |
-
def preprocess(example):
|
85 |
-
return {"text": tokenizer.apply_chat_template(example["messages"], tokenize=False)}
|
86 |
-
|
87 |
-
ds = ds.map(preprocess)
|
88 |
-
|
89 |
-
def tokenize(sample):
|
90 |
-
return tokenizer(
|
91 |
-
sample["text"],
|
92 |
-
padding=False,
|
93 |
-
max_length=MAX_SEQUENCE_LENGTH,
|
94 |
-
truncation=True,
|
95 |
-
add_special_tokens=False
|
96 |
-
)
|
97 |
-
|
98 |
-
ds = ds.map(tokenize)
|
99 |
-
|
100 |
-
# Quantization Recipe
|
101 |
-
recipe = GPTQModifier(
|
102 |
-
targets="Linear",
|
103 |
-
scheme="W4A16",
|
104 |
-
ignore=["lm_head"],
|
105 |
-
dampening_frac=0.1
|
106 |
-
)
|
107 |
-
|
108 |
-
# Apply Quantization
|
109 |
-
oneshot(
|
110 |
-
model=model,
|
111 |
-
dataset=ds,
|
112 |
-
recipe=recipe,
|
113 |
-
oneshot_device=device_map,
|
114 |
-
max_seq_length=MAX_SEQUENCE_LENGTH,
|
115 |
-
num_calibration_samples=NUM_CALIBRATION_SAMPLES,
|
116 |
-
accelerator_config={
|
117 |
-
'split_batches': True,
|
118 |
-
'dispatch_batches': None,
|
119 |
-
'even_batches': True,
|
120 |
-
'use_seedable_sampler': True,
|
121 |
-
'non_blocking': False,
|
122 |
-
'gradient_accumulation_kwargs': None,
|
123 |
-
'use_configured_state': False
|
124 |
-
}
|
125 |
-
)
|
126 |
-
|
127 |
-
# Save the Quantized Model
|
128 |
-
SAVE_DIR = "./arcee-ai/SuperNova-Medius-CM-w4a16"
|
129 |
-
model.save_pretrained(SAVE_DIR, save_compressed=True)
|
130 |
-
tokenizer.save_pretrained(SAVE_DIR)
|
131 |
-
|
132 |
-
Dependencies
|
133 |
-
|
134 |
-
The quantization process was executed with the following package versions:
|
135 |
-
• Python Version: 3.9.x
|
136 |
-
• Packages:
|
137 |
-
• torch: 2.5.1
|
138 |
-
• transformers: 4.46.2
|
139 |
-
• llmcompressor: 0.5.0
|
140 |
-
• vllm: 0.6.4
|
141 |
-
• datasets: 3.1.0
|
142 |
-
• huggingface_hub: 0.24.7
|
143 |
-
• compressed-tensors: 0.8.0
|
144 |
-
|
145 |
-
A full list of installed packages is available in the requirements.txt file.
|
146 |
-
|
147 |
-
Training Data
|
148 |
-
|
149 |
-
The model was quantized using 1,024 samples from the neuralmagic/LLM_compression_calibration dataset. The data was preprocessed to fit the model’s expected input format.
|
150 |
-
|
151 |
-
Evaluation Results
|
152 |
-
|
153 |
-
Evaluation metrics comparing the quantized model to the original model will be provided in future updates.
|
154 |
-
|
155 |
-
Limitations and Biases
|
156 |
-
|
157 |
-
• Performance Degradation: While quantization reduces model size and increases speed, it may introduce slight performance degradation compared to the original model.
|
158 |
-
• Inherited Biases: The model may carry over biases present in the original SuperNova-Medius model. Users should exercise caution and critically evaluate the model’s outputs.
|
159 |
-
|
160 |
-
Acknowledgements
|
161 |
-
|
162 |
-
• Original Model: arcee-ai/SuperNova-Medius
|
163 |
-
• Quantization Tools: LLM Compressor
|
164 |
-
• Contributors: Edward Kim and Jaro Uljanovs
|
165 |
-
|
166 |
-
Citation
|
167 |
-
|
168 |
-
If you use this model, please cite:
|
169 |
-
|
170 |
-
@misc{SuperNovaMediusCMW4A16,
|
171 |
-
author = {Edward Kim and Jaro Uljanovs},
|
172 |
-
title = {SuperNova Medius Compressed Model W4A16},
|
173 |
-
year = {2024},
|
174 |
-
howpublished = {\url{https://huggingface.co/arcee-ai/SuperNova-Medius-CM-w4a16}},
|
175 |
-
}
|
176 |
-
|
177 |
-
Model Card Template
|
178 |
-
|
179 |
-
[Model Name]
|
180 |
-
|
181 |
-
Model ID: [Repository/Model ID]
|
182 |
-
|
183 |
-
Overview
|
184 |
-
|
185 |
-
[Provide a concise description of the model, its purpose, and any unique features.]
|
186 |
-
|
187 |
-
Model Details
|
188 |
-
|
189 |
-
• Base Model: [Link to or name of the base model]
|
190 |
-
• Model Architecture: [Describe the architecture]
|
191 |
-
• Quantization Method (if applicable): [Details about quantization]
|
192 |
-
• Training Data: [Brief description of the dataset(s) used]
|
193 |
-
• Parameters:
|
194 |
-
• Targets: [Layer types targeted for quantization]
|
195 |
-
• Scheme: [Quantization scheme]
|
196 |
-
• Ignored Layers: [Layers excluded from quantization]
|
197 |
-
• Dampening Fraction: [Value used if applicable]
|
198 |
-
• Calibration Dataset (if applicable): [Dataset used for calibration]
|
199 |
-
• Number of Calibration Samples: [Number]
|
200 |
-
• Maximum Sequence Length: [Value]
|
201 |
-
• Random Seed: [Value]
|
202 |
-
|
203 |
-
Intended Use
|
204 |
-
|
205 |
-
[Explain the intended applications and scope of use for the model.]
|
206 |
-
|
207 |
-
How to Use
|
208 |
-
|
209 |
-
from transformers import AutoTokenizer, AutoModelForCausalLM
|
210 |
-
|
211 |
-
tokenizer = AutoTokenizer.from_pretrained("[Model ID]")
|
212 |
-
model = AutoModelForCausalLM.from_pretrained("[Model ID]")
|
213 |
-
|
214 |
-
input_text = "Your input text here."
|
215 |
-
input_ids = tokenizer.encode(input_text, return_tensors='pt')
|
216 |
-
|
217 |
-
output = model.generate(input_ids)
|
218 |
-
print(tokenizer.decode(output[0], skip_special_tokens=True))
|
219 |
-
|
220 |
-
[Training and] Quantization Details
|
221 |
-
|
222 |
-
[Provide scripts and detailed steps used in training and/or quantization.]
|
223 |
-
|
224 |
-
# Example Script
|
225 |
-
|
226 |
-
# Imports
|
227 |
-
import torch
|
228 |
-
# ... [rest of the script]
|
229 |
-
|
230 |
-
Dependencies
|
231 |
-
|
232 |
-
• Python Version: [Version]
|
233 |
-
• Packages:
|
234 |
-
• [Package Name]: [Version]
|
235 |
-
• List all critical packages and their versions.
|
236 |
-
|
237 |
-
Training Data
|
238 |
-
|
239 |
-
[Provide detailed information about the training data, including sources, preprocessing steps, and any relevant statistics.]
|
240 |
-
|
241 |
-
Evaluation Results
|
242 |
-
|
243 |
-
[Present evaluation metrics, benchmarks, and any comparisons with other models.]
|
244 |
-
|
245 |
-
Limitations and Biases
|
246 |
-
|
247 |
-
[List known limitations, potential biases, and ethical considerations.]
|
248 |
-
|
249 |
-
Acknowledgements
|
250 |
-
|
251 |
-
• Contributors: [Names of contributors]
|
252 |
-
• Resources: [Any libraries, datasets, or tools that were instrumental]
|
253 |
-
|
254 |
-
Citation
|
255 |
-
|
256 |
-
[Provide citation information.]
|
257 |
-
|
258 |
-
@misc{ModelName,
|
259 |
-
author = {[Author Names]},
|
260 |
-
title = {[Model Title]},
|
261 |
-
year = {[Year]},
|
262 |
-
howpublished = {\url{[Model URL]}},
|
263 |
-
}
|
264 |
-
|
265 |
-
Note: This template is designed to provide a comprehensive overview of a machine learning model, facilitating reproducibility and transparency. Feel free to add or remove sections based on the specific needs of your project.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|