GusPuffy
/

Legion-V2.1-LLaMa-70B-GPTQ

+---
+base_model:
+- Tarek07/Legion-V2.1-LLaMa-70B
+library_name: transformers
+tags:
+- mergekit
+- merge
+- llmcompressor
+- GPTQ
+license: llama3.3
+datasets:
+- openerotica/erotiquant3
+---
+<p align="center">
+<img width="120px" alt="Sentient Simulations Plumbob" src="https://www.sentientsimulations.com/transparent-plumbob2.png">
+</p>
+<p align="center"><a href="https://www.sentientsimulations.com/">[🏠Sentient Simulations]</a>  |  <a href="https://discord.com/invite/JTjbydmUAp">[Discord]</a>  |  <a href="https://www.patreon.com/SentientSims">[Patreon]</a>
+<hr>
+# Llama-3.1-70B-ArliAI-RPMax-v1.3-GPTQ
+This repository contains a 4 bit GPTQ-quantized version of the [Tarek07/Legion-V2.1-LLaMa-70B model](https://huggingface.co/Tarek07/Legion-V2.1-LLaMa-70B) using [llm-compressor](https://github.com/vllm-project/llm-compressor).
+## Quantization Settings
+| **Attribute**                   | **Value**                                                                          |
+|---------------------------------|------------------------------------------------------------------------------------|
+| **Algorithm**                   | GPTQ                                                                               |
+| **Layers**                      | Linear                                                                             |
+| **Weight Scheme**               | W4A16                                                                              |
+| **Group Size**                  | 128                                                                                |
+| **Calibration Dataset**         | [openerotica/erotiquant3](https://huggingface.co/datasets/openerotica/erotiquant3) |
+| **Calibration Sequence Length** | 4096                                                                               |
+| **Calibration Samples**         | 512                                                                                |
+### Dataset Preprocessing
+The dataset was preprocessed with the following steps:
+1. Extract and structure the conversation data using role-based templates (`SYSTEM`, `USER`, `ASSISTANT`).
+2. Convert the structured conversations into a tokenized format using the model's tokenizer.
+3. Filter out sequences shorter than 4096 tokens.
+4. Shuffle and select 512 samples for calibration.
+## Quantization Process
+View the shell and python script used to quantize this model.
+2 rtx pro 6000 with 565GB of ram, 300GB of disk space was rented on runpod.
+Quantization took approximately 3.5 hours with a total of \$14.32 in compute costs.
+- [compress.sh](./compress.sh)
+- [compress.py](./compress.py)
+## Acknowledgments
+- Base Model: [Tarek07/Legion-V2.1-LLaMa-70B](https://huggingface.co/Tarek07/Legion-V2.1-LLaMa-70B)
+- Calibration Dataset: [openerotica/erotiquant3](https://huggingface.co/datasets/openerotica/erotiquant3)
+- LLM Compressor: [llm-compressor](https://github.com/vllm-project/llm-compressor)
+- Everyone subscribed to the [Sentient Simulations Patreon](https://www.patreon.com/SentientSims)
+![patreon.PNG](https://huggingface.co/GusPuffy/Llama-3.1-70B-ArliAI-RPMax-v1.3-GPTQ/resolve/main/patreon.PNG)