OpenVINO-Mistral
Collection
8 items
โข
Updated
With OpenVINO GenAI:
import openvino_genai as ov_genai
model_dir = "path-to-your-converted-model"
pipe = ov_genai.LLMPipeline(
model_dir, # Path to the model directory
"CPU", # Define the device to use
)
generation_config = ov_genai.GenerationConfig(
max_new_tokens=128
)
prompt = "We don't even have a chat template so strap in and let it ride!"
result = pipe.generate([prompt], generation_config=generation_config)
perf_metrics = result.perf_metrics
print(f'Load time: {perf_metrics.get_load_time() / 1000:.2f} s')
print(f'Time to first token: {perf_metrics.get_ttft().mean / 1000:.2f} s')
print(f'Time per token: {perf_metrics.get_tpot().mean:.2f} ms/token')
print(f'Throughput: {perf_metrics.get_throughput().mean:.2f} tokens/s')
print(f'Generate duration: {perf_metrics.get_generate_duration().mean / 1000:.2f} s')
print(result)
And with Optimum-Intel, which is an OpenVINO integration for Transformers.
Base model
TheDrummer/Cydonia-24B-v2.1