metadata

base_model:
  - Tarek07/Legion-V2.1-LLaMa-70B
library_name: transformers
tags:
  - mergekit
  - merge
  - llmcompressor
  - GPTQ
license: llama3.3
datasets:
  - openerotica/erotiquant3

Sentient Simulations Plumbob

[🏠Sentient Simulations] | [Discord] | [Patreon]

Llama-3.1-70B-ArliAI-RPMax-v1.3-GPTQ

This repository contains a 4 bit GPTQ-quantized version of the Tarek07/Legion-V2.1-LLaMa-70B model using llm-compressor.

Quantization Settings

Attribute	Value
Algorithm	GPTQ
Layers	Linear
Weight Scheme	W4A16
Group Size	128
Calibration Dataset	openerotica/erotiquant3
Calibration Sequence Length	4096
Calibration Samples	512

Dataset Preprocessing

The dataset was preprocessed with the following steps:

Extract and structure the conversation data using role-based templates (SYSTEM, USER, ASSISTANT).
Convert the structured conversations into a tokenized format using the model's tokenizer.
Filter out sequences shorter than 4096 tokens.
Shuffle and select 512 samples for calibration.

Quantization Process

View the shell and python script used to quantize this model.

2 rtx pro 6000 with 565GB of ram, 300GB of disk space was rented on runpod.

Quantization took approximately 3.5 hours with a total of $14.32 in compute costs.

Acknowledgments

Base Model: Tarek07/Legion-V2.1-LLaMa-70B
Calibration Dataset: openerotica/erotiquant3
LLM Compressor: llm-compressor
Everyone subscribed to the Sentient Simulations Patreon