Mistral-7B-Instruct-v0.2: Local LLM Model Repository

...

This repository provides quantized GGUF and ONNX exports of Mistral-7B-Instruct-v0.2, optimized for efficient local inference—especially on resource-constrained devices like Raspberry Pi.

🦙 GGUF Model (Q8_0)

Filename: mistral-7b-instruct-v0.2.Q8_0-Q8_0.gguf
Format: GGUF (Q8_0)
Best for: llama.cpp, koboldcpp, LM Studio, and similar tools.

Quick Start

./main -m mistral-7b-instruct-v0.2.Q8_0-Q8_0.gguf -p "Hello, world!"

This quantized GGUF model is designed for fast, memory-efficient inference on local hardware, including Raspberry Pi and other edge devices.

🟦 ONNX Model

Filename: mistral-7b-instruct-v0.2.onnx
Format: ONNX
Best for: ONNX Runtime, Kleidi AI, and compatible frameworks.

Quick Start

import onnxruntime as ort

session = ort.InferenceSession("mistral-7b-instruct-v0.2.onnx")
# ... inference code here ...

The ONNX export enables efficient inference on CPUs, GPUs, and accelerators—ideal for local deployment.

📋 Credits

Base model: Mistral AI
Quantization: llama.cpp
ONNX export: Optimum, ONNX Runtime

Maintainer: Makatia