Meta-Llama-3-8B-Instruct-ct2-int8
This is a ctranslate2 v4.5.0 int8 conversion of meta-llama/Meta-Llama-3-8B-Instruct created with:
ct2-transformers-converter --model meta-llama/Meta-Llama-3-8B-Instruct --output_dir Meta-Llama-3-8B-Instruct-ct2-int8 --quantization int8
Downloading
ct2 doesn't have hf-hub integration, so you'll need to manually download the model files:
huggingface-cli download mike-ravkine/Meta-Llama-3-8B-Instruct-ct2-int8 --local-dir Meta-Llama-3-8B-Instruct-ct2-int8/
Using
Install dependencies:
pip install transformers[torch] ctranslate2
Sample inference code:
import sys
import ctranslate2
from transformers import AutoTokenizer
model_dir = sys.argv[1] # download dir
tokenizer_dir = meta-llama/Meta-Llama-3-8B-Instruct
print("Loading the model...")
generator = ctranslate2.Generator(model_dir, device="cuda")
tokenizer = AutoTokenizer.from_pretrained(tokenizer_dir)
dialog = [{"role": "user", "content": "What is the meaning of life, the universe and everything?"}]
max_generation_length = 512
prompt_string = tokenizer.apply_chat_template(dialog, add_generation_prompt=True, tokenize=False)
# It seems silly to tokenize=False and then call tokenize, but tokenize=True returns just ids; we need actual tokens
prompt_tokens = tokenizer.tokenize(prompt_string)
step_results = generator.generate_tokens(
prompt_tokens,
max_length=max_generation_length,
sampling_temperature=0.6,
sampling_topk=20,
sampling_topp=1,
)
for step_result in step_results:
word = tokenizer.decode([step_result.token_id])
print(word, end="", flush=True)
- Downloads last month
- 5