JustJaro
/

SuperNova-Medius-CM-w4a16

Model card Files Files and versions Community

JustJaro commited on Nov 17, 2024

Commit

02f5c8a

verified ·

1 Parent(s): 19b7df4

Update README.md

Browse files

Files changed (1) hide show

README.md +0 -265

README.md CHANGED Viewed

@@ -1,265 +0,0 @@
-arcee-ai/SuperNova-Medius-CM-w4a16
-Model Card
-Model Name: SuperNova Medius Compressed Model (W4A16)
-Model ID: arcee-ai/SuperNova-Medius-CM-w4a16
-Overview
-The SuperNova Medius CM W4A16 is a quantized version of the arcee-ai/SuperNova-Medius model. This compressed model leverages GPTQ (Generalized Post-Training Quantization) to reduce the model size and accelerate inference while maintaining performance close to the original model. The quantization scheme used is Weight 4-bit, Activation 16-bit (W4A16).
-Model Details
-	•	Base Model: arcee-ai/SuperNova-Medius
-	•	Quantization Method: GPTQ (Generalized Post-Training Quantization)
-	•	Quantization Parameters:
-	•	Targets: Linear layers
-	•	Scheme: W4A16 (Weights quantized to 4 bits, activations to 16 bits)
-	•	Ignored Layers: lm_head
-	•	Dampening Fraction: 0.1
-	•	Calibration Dataset: neuralmagic/LLM_compression_calibration
-	•	Number of Calibration Samples: 1024
-	•	Maximum Sequence Length: 4096
-	•	Random Seed: 42
-Intended Use
-This model is designed for developers and researchers who need a smaller and faster version of the SuperNova-Medius model for inference tasks, especially in environments with limited computational resources.
-How to Use
-from transformers import AutoTokenizer, AutoModelForCausalLM
-tokenizer = AutoTokenizer.from_pretrained("arcee-ai/SuperNova-Medius-CM-w4a16")
-model = AutoModelForCausalLM.from_pretrained("arcee-ai/SuperNova-Medius-CM-w4a16")
-input_text = "Hello, how are you?"
-input_ids = tokenizer.encode(input_text, return_tensors='pt')
-output = model.generate(input_ids)
-print(tokenizer.decode(output[0], skip_special_tokens=True))
-Quantization Details
-The quantization process was executed using the following script:
-# Quantization Script
-import torch
-from datasets import load_dataset
-from transformers import AutoTokenizer
-from llmcompressor.modifiers.quantization import GPTQModifier
-from llmcompressor.transformers import SparseAutoModelForCausalLM, oneshot
-from llmcompressor.transformers.compression.helpers import calculate_offload_device_map
-# Parameters
-MODEL_ID = "./arcee-ai/SuperNova-Medius"
-NUM_CALIBRATION_SAMPLES = 1024
-MAX_SEQUENCE_LENGTH = 4096
-SEED = 42
-# Device Map Calculation
-device_map = calculate_offload_device_map(
-    MODEL_ID,
-    num_gpus=torch.cuda.device_count(),
-    reserve_for_hessians=True,
-    torch_dtype=torch.bfloat16
-)
-# Load Model and Tokenizer
-model = SparseAutoModelForCausalLM.from_pretrained(
-    MODEL_ID,
-    device_map=device_map,
-    torch_dtype=torch.bfloat16
-)
-tokenizer = AutoTokenizer.from_pretrained(MODEL_ID)
-# Load and Preprocess Calibration Dataset
-DATASET_ID = "neuralmagic/LLM_compression_calibration"
-ds = load_dataset(DATASET_ID)
-ds = ds["train"].shuffle(seed=SEED).select(range(NUM_CALIBRATION_SAMPLES))
-def preprocess(example):
-    return {"text": tokenizer.apply_chat_template(example["messages"], tokenize=False)}
-ds = ds.map(preprocess)
-def tokenize(sample):
-    return tokenizer(
-        sample["text"],
-        padding=False,
-        max_length=MAX_SEQUENCE_LENGTH,
-        truncation=True,
-        add_special_tokens=False
-    )
-ds = ds.map(tokenize)
-# Quantization Recipe
-recipe = GPTQModifier(
-    targets="Linear",
-    scheme="W4A16",
-    ignore=["lm_head"],
-    dampening_frac=0.1
-)
-# Apply Quantization
-oneshot(
-    model=model,
-    dataset=ds,
-    recipe=recipe,
-    oneshot_device=device_map,
-    max_seq_length=MAX_SEQUENCE_LENGTH,
-    num_calibration_samples=NUM_CALIBRATION_SAMPLES,
-    accelerator_config={
-        'split_batches': True,
-        'dispatch_batches': None,
-        'even_batches': True,
-        'use_seedable_sampler': True,
-        'non_blocking': False,
-        'gradient_accumulation_kwargs': None,
-        'use_configured_state': False
-    }
-)
-# Save the Quantized Model
-SAVE_DIR = "./arcee-ai/SuperNova-Medius-CM-w4a16"
-model.save_pretrained(SAVE_DIR, save_compressed=True)
-tokenizer.save_pretrained(SAVE_DIR)
-Dependencies
-The quantization process was executed with the following package versions:
-	•	Python Version: 3.9.x
-	•	Packages:
-	•	torch: 2.5.1
-	•	transformers: 4.46.2
-	•	llmcompressor: 0.5.0
-	•	vllm: 0.6.4
-	•	datasets: 3.1.0
-	•	huggingface_hub: 0.24.7
-	•	compressed-tensors: 0.8.0
-A full list of installed packages is available in the requirements.txt file.
-Training Data
-The model was quantized using 1,024 samples from the neuralmagic/LLM_compression_calibration dataset. The data was preprocessed to fit the model’s expected input format.
-Evaluation Results
-Evaluation metrics comparing the quantized model to the original model will be provided in future updates.
-Limitations and Biases
-	•	Performance Degradation: While quantization reduces model size and increases speed, it may introduce slight performance degradation compared to the original model.
-	•	Inherited Biases: The model may carry over biases present in the original SuperNova-Medius model. Users should exercise caution and critically evaluate the model’s outputs.
-Acknowledgements
-	•	Original Model: arcee-ai/SuperNova-Medius
-	•	Quantization Tools: LLM Compressor
-	•	Contributors: Edward Kim and Jaro Uljanovs
-Citation
-If you use this model, please cite:
-@misc{SuperNovaMediusCMW4A16,
-  author = {Edward Kim and Jaro Uljanovs},
-  title = {SuperNova Medius Compressed Model W4A16},
-  year = {2024},
-  howpublished = {\url{https://huggingface.co/arcee-ai/SuperNova-Medius-CM-w4a16}},
-}
-Model Card Template
-[Model Name]
-Model ID: [Repository/Model ID]
-Overview
-[Provide a concise description of the model, its purpose, and any unique features.]
-Model Details
-	•	Base Model: [Link to or name of the base model]
-	•	Model Architecture: [Describe the architecture]
-	•	Quantization Method (if applicable): [Details about quantization]
-	•	Training Data: [Brief description of the dataset(s) used]
-	•	Parameters:
-	•	Targets: [Layer types targeted for quantization]
-	•	Scheme: [Quantization scheme]
-	•	Ignored Layers: [Layers excluded from quantization]
-	•	Dampening Fraction: [Value used if applicable]
-	•	Calibration Dataset (if applicable): [Dataset used for calibration]
-	•	Number of Calibration Samples: [Number]
-	•	Maximum Sequence Length: [Value]
-	•	Random Seed: [Value]
-Intended Use
-[Explain the intended applications and scope of use for the model.]
-How to Use
-from transformers import AutoTokenizer, AutoModelForCausalLM
-tokenizer = AutoTokenizer.from_pretrained("[Model ID]")
-model = AutoModelForCausalLM.from_pretrained("[Model ID]")
-input_text = "Your input text here."
-input_ids = tokenizer.encode(input_text, return_tensors='pt')
-output = model.generate(input_ids)
-print(tokenizer.decode(output[0], skip_special_tokens=True))
-[Training and] Quantization Details
-[Provide scripts and detailed steps used in training and/or quantization.]
-# Example Script
-# Imports
-import torch
-# ... [rest of the script]
-Dependencies
-	•	Python Version: [Version]
-	•	Packages:
-	•	[Package Name]: [Version]
-	•	List all critical packages and their versions.
-Training Data
-[Provide detailed information about the training data, including sources, preprocessing steps, and any relevant statistics.]
-Evaluation Results
-[Present evaluation metrics, benchmarks, and any comparisons with other models.]
-Limitations and Biases
-[List known limitations, potential biases, and ethical considerations.]
-Acknowledgements
-	•	Contributors: [Names of contributors]
-	•	Resources: [Any libraries, datasets, or tools that were instrumental]
-Citation
-[Provide citation information.]
-@misc{ModelName,
-  author = {[Author Names]},
-  title = {[Model Title]},
-  year = {[Year]},
-  howpublished = {\url{[Model URL]}},
-}
-Note: This template is designed to provide a comprehensive overview of a machine learning model, facilitating reproducibility and transparency. Feel free to add or remove sections based on the specific needs of your project.