open-lilm-v2-q4

This is simply a quantized version of open-lilm-v2 without other modification. This model is only intended for research or entertainment purposes as the original model.

Warning: Due to the nature of the training data, this model is highly likely to return violent, racist and discriminative content. DO NOT USE IN PRODUCTION ENVIRONMENT.

Model Details

  • Name: open-lilm-v2-q4
  • Quantization: 4-bit quantization
  • Base Model: 0xtaipoian/open-lilm-v2

Usage

This model can be used with Hugging Face's Transformers library:

from transformers import AutoTokenizer, AutoModelForCausalLM

model_name = "liemo/open-lilm-v2-q4"

model = AutoModelForCausalLM.from_pretrained(model_name)
tokenizer = AutoTokenizer.from_pretrained(model_name)

def chat(messages, temperature=0.9, max_new_tokens=200):
    input_ids = tokenizer.apply_chat_template(conversation=messages, tokenize=True, add_generation_prompt=True, return_tensors='pt').to('cuda:0')
    output_ids = quantized_model.generate(input_ids, max_new_tokens=max_new_tokens, temperature=temperature, do_sample=True)

    chatml = tokenizer.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
    print(chatml)

    response = tokenizer.decode(output_ids[0][input_ids.shape[1]:], skip_special_tokens=False)

    return response

messages = [
    {"role": "user",
     "content": """
    INPUT_CONTENT_HERE
     """}
]

result = chat(messages, max_new_tokens=200, temperature=1)
print(result)
Downloads last month
17
Safetensors
Model size
3.39B params
Tensor type
F32
·
F16
·
U8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for liemo/open-lilm-v2-q4

Quantized
(1)
this model