Mistral-7B v0.1 — 8-bit (bitsandbytes)

This repository provides an 8-bit (bitsandbytes) quantized version of mistralai/Mistral-7B-v0.1 for efficient inference on consumer GPUs. The architecture and tokenizer are unchanged; only the weight format/loader is adapted for memory savings.

Note: This is not an instruction-tuned or fine-tuned model—just an 8-bit loading of the base model for inference.

Model Details

Developed by: Mistral AI (base model); quantized and shared by the repo maintainer
Model type: Decoder-only causal language model (7B parameters)
Languages: English (primarily), may generalize to other languages to a limited extent
License: Apache-2.0 (inherits from base model)
Finetuned from: No – this is the base model loaded in 8-bit (bnb.int8)

Model Sources

Base model: https://huggingface.co/mistralai/Mistral-7B-v0.1
This repo: https://huggingface.co/KavinduHansaka/mistral-7b-v0.1-8bit

What’s Included

config.json, generation_config.json
Weights (sharded safetensors):
- model-00001-of-00002.safetensors
- model-00002-of-00002.safetensors
- model.safetensors.index.json
Tokenizer files:
- tokenizer.model, tokenizer.json, tokenizer_config.json, special_tokens_map.json

Uses

Direct Use

General text generation, drafting, exploration, and research.
Running on limited-VRAM GPUs (≈8–12 GB) where FP16 is not feasible.

Downstream Use

Use as a base for experimentation (prompting, lightweight adapters) or as a drop-in 8-bit runtime in apps.

Out-of-Scope Use

Safety-critical decisions, disallowed content generation, or applications requiring guaranteed factual correctness.
Tasks requiring instruction tuning or alignment (this is not an instruct model).

Bias, Risks, and Limitations

Bias & Safety: Inherits biases and safety limitations of the base model. It may produce inaccurate, offensive, or harmful content.
Quality vs FP16: 8-bit loading can introduce minor quality regression versus FP16 on some prompts/long contexts.
Long Context Behavior: Extremely long generations may degrade faster in quantized modes.

Recommendations

Add guardrails (filtering, human-in-the-loop) for user-facing deployments.
Evaluate on your domain tasks before production use.
Consider an instruction-tuned variant if you need chat-style behavior out of the box.

How to Get Started

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

REPO_ID = "KavinduHansaka/mistral-7b-v0.1-8bit"

tok = AutoTokenizer.from_pretrained(REPO_ID, use_fast=True)
if tok.pad_token is None:
    tok.pad_token = tok.eos_token
tok.padding_side = "left"  # or "right"

model = AutoModelForCausalLM.from_pretrained(
    REPO_ID,
    device_map="auto",
    load_in_8bit=True,   # bitsandbytes 8-bit
    torch_dtype="auto"
)

gen = pipeline("text-generation", model=model, tokenizer=tok, device_map="auto")
print(gen("Explain attention mechanisms simply.", max_new_tokens=200)[0]["generated_text"])

Suggested dependencies

pip install -U "transformers>=4.41" "accelerate>=0.28" "bitsandbytes>=0.43.3" sentencepiece

Training Details

Training data / procedure: N/A (no additional training; this is the original Mistral-7B v0.1 loaded with 8-bit weights).
Precision: Runtime bnb.int8 quantization for inference (not GPTQ/AWQ/QLoRA).
Speeds & Sizes: Depends on GPU; typical 8–12 GB VRAM allows single-prompt generations with moderate lengths.

Technical Specifications

Architecture

Decoder-only Transformer, ~7B parameters (see base model for specifics).

Compute

Hardware: Single NVIDIA GPU recommended (≥8 GB VRAM).
Software: Python, transformers, accelerate, bitsandbytes, sentencepiece, CUDA toolkit/driver for your GPU.

Citation

If you use this model, please cite Mistral AI and the libraries you rely on:

@software{mistral_7b_v0_1,
  title  = {Mistral 7B v0.1},
  author = {Mistral AI},
  year   = {2023},
  url    = {https://huggingface.co/mistralai/Mistral-7B-v0.1}
}

Downloads last month: 16

Safetensors

Model size

7.24B params

Tensor type

F32

F16

Model tree for KavinduHansaka/mistral-7b-v0.1-8bit

Base model

mistralai/Mistral-7B-v0.1

Quantized

(192)

this model