Edit model card

Uni-TianYan-4bit-gptq

Uni-TianYan-4bit-gptq is a qunatnized version of uni-tianyan/Uni-TianYan using GPTQ Quantnization. This model is only 35 GB in size in comparision with the original uni-tianyan/Uni-TianYan 127 GB and can run on a single A6000 GPU

Model Details

  • Quantnized by: [email protected] ;
  • Model type: quantnized version of uni-tianyan/Uni-TianYan using 4bit quantnization
  • Language(s): English
  • License: Non-Commercial Creative Commons license (CC BY-NC-4.0)

Prompt Template

### Instruction:

<prompt> (without the <>)

### Response:

Training Dataset

uni-tianyan/Uni-TianYan quantnized using gptq on Alpaca dataset yahma/alpaca-cleaned.

Training Procedure

uni-tianyan/Uni-TianYan was fine-tuned using gptq on 2 L40 48GB.

How to Get Started with the Model

First install auto_gptq with

pip install auto_gptq

Use the code sample provided in the original post to interact with the model.

from transformers import AutoTokenizer
from auto_gptq import AutoGPTQForCausalLM
 
model_id = "uni-tianyan/Uni-TianYan"
model = AutoGPTQForCausalLM.from_quantized(model_id,inject_fused_attention=False,
        use_safetensors=True,
        trust_remote_code=False,
        use_triton=False,
        quantize_config=None)

tokenizer = AutoTokenizer.from_pretrained(model_id)

question: "Who was the first person to walk on the moon?"
# For generating a response
prompt = '''
### Instruction:
{question} 

### Response:'''
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
output = model.generate(input_ids)
response = tokenizer.decode(output[0])

print(response)

Citations

@misc{touvron2023llama,
    title={Llama 2: Open Foundation and Fine-Tuned Chat Models}, 
    author={Hugo Touvron and Louis Martin and Kevin Stone and Peter Albert and Amjad Almahairi and Yasmine Babaei and Nikolay Bashlykov       year={2023},
    eprint={2307.09288},
    archivePrefix={arXiv},
}
@misc{frantar2023gptq,
      title={GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers}, 
      author={Elias Frantar and Saleh Ashkboos and Torsten Hoefler and Dan Alistarh},
      year={2023},
      eprint={2210.17323},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}
Downloads last month
15
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train malhajar/Uni-TianYan-4bit-gptq