GusPuffy commited on
Commit
b6f727e
·
verified ·
1 Parent(s): 695d091

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +62 -0
README.md ADDED
@@ -0,0 +1,62 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - Tarek07/Legion-V2.1-LLaMa-70B
4
+ library_name: transformers
5
+ tags:
6
+ - mergekit
7
+ - merge
8
+ - llmcompressor
9
+ - GPTQ
10
+ license: llama3.3
11
+ datasets:
12
+ - openerotica/erotiquant3
13
+ ---
14
+ <p align="center">
15
+ <img width="120px" alt="Sentient Simulations Plumbob" src="https://www.sentientsimulations.com/transparent-plumbob2.png">
16
+ </p>
17
+ <p align="center"><a href="https://www.sentientsimulations.com/">[🏠Sentient Simulations]</a> | <a href="https://discord.com/invite/JTjbydmUAp">[Discord]</a> | <a href="https://www.patreon.com/SentientSims">[Patreon]</a>
18
+ <hr>
19
+
20
+ # Llama-3.1-70B-ArliAI-RPMax-v1.3-GPTQ
21
+
22
+ This repository contains a 4 bit GPTQ-quantized version of the [Tarek07/Legion-V2.1-LLaMa-70B model](https://huggingface.co/Tarek07/Legion-V2.1-LLaMa-70B) using [llm-compressor](https://github.com/vllm-project/llm-compressor).
23
+
24
+ ## Quantization Settings
25
+
26
+ | **Attribute** | **Value** |
27
+ |---------------------------------|------------------------------------------------------------------------------------|
28
+ | **Algorithm** | GPTQ |
29
+ | **Layers** | Linear |
30
+ | **Weight Scheme** | W4A16 |
31
+ | **Group Size** | 128 |
32
+ | **Calibration Dataset** | [openerotica/erotiquant3](https://huggingface.co/datasets/openerotica/erotiquant3) |
33
+ | **Calibration Sequence Length** | 4096 |
34
+ | **Calibration Samples** | 512 |
35
+
36
+ ### Dataset Preprocessing
37
+
38
+ The dataset was preprocessed with the following steps:
39
+ 1. Extract and structure the conversation data using role-based templates (`SYSTEM`, `USER`, `ASSISTANT`).
40
+ 2. Convert the structured conversations into a tokenized format using the model's tokenizer.
41
+ 3. Filter out sequences shorter than 4096 tokens.
42
+ 4. Shuffle and select 512 samples for calibration.
43
+
44
+ ## Quantization Process
45
+
46
+ View the shell and python script used to quantize this model.
47
+
48
+ 2 rtx pro 6000 with 565GB of ram, 300GB of disk space was rented on runpod.
49
+
50
+ Quantization took approximately 3.5 hours with a total of \$14.32 in compute costs.
51
+
52
+ - [compress.sh](./compress.sh)
53
+ - [compress.py](./compress.py)
54
+
55
+ ## Acknowledgments
56
+
57
+ - Base Model: [Tarek07/Legion-V2.1-LLaMa-70B](https://huggingface.co/Tarek07/Legion-V2.1-LLaMa-70B)
58
+ - Calibration Dataset: [openerotica/erotiquant3](https://huggingface.co/datasets/openerotica/erotiquant3)
59
+ - LLM Compressor: [llm-compressor](https://github.com/vllm-project/llm-compressor)
60
+ - Everyone subscribed to the [Sentient Simulations Patreon](https://www.patreon.com/SentientSims)
61
+
62
+ ![patreon.PNG](https://huggingface.co/GusPuffy/Llama-3.1-70B-ArliAI-RPMax-v1.3-GPTQ/resolve/main/patreon.PNG)