Mistral-7B v0.1 — 8-bit (bitsandbytes)

This repository provides an 8-bit (bitsandbytes) quantized version of mistralai/Mistral-7B-v0.1 for efficient inference on consumer GPUs. The architecture and tokenizer are unchanged; only the weight format/loader is adapted for memory savings.

Note: This is not an instruction-tuned or fine-tuned model—just an 8-bit loading of the base model for inference.


Model Details

  • Developed by: Mistral AI (base model); quantized and shared by the repo maintainer
  • Model type: Decoder-only causal language model (7B parameters)
  • Languages: English (primarily), may generalize to other languages to a limited extent
  • License: Apache-2.0 (inherits from base model)
  • Finetuned from: No – this is the base model loaded in 8-bit (bnb.int8)

Model Sources


What’s Included

  • config.json, generation_config.json
  • Weights (sharded safetensors):
    • model-00001-of-00002.safetensors
    • model-00002-of-00002.safetensors
    • model.safetensors.index.json
  • Tokenizer files:
    • tokenizer.model, tokenizer.json, tokenizer_config.json, special_tokens_map.json

Uses

Direct Use

  • General text generation, drafting, exploration, and research.
  • Running on limited-VRAM GPUs (≈8–12 GB) where FP16 is not feasible.

Downstream Use

  • Use as a base for experimentation (prompting, lightweight adapters) or as a drop-in 8-bit runtime in apps.

Out-of-Scope Use

  • Safety-critical decisions, disallowed content generation, or applications requiring guaranteed factual correctness.
  • Tasks requiring instruction tuning or alignment (this is not an instruct model).

Bias, Risks, and Limitations

  • Bias & Safety: Inherits biases and safety limitations of the base model. It may produce inaccurate, offensive, or harmful content.
  • Quality vs FP16: 8-bit loading can introduce minor quality regression versus FP16 on some prompts/long contexts.
  • Long Context Behavior: Extremely long generations may degrade faster in quantized modes.

Recommendations

  • Add guardrails (filtering, human-in-the-loop) for user-facing deployments.
  • Evaluate on your domain tasks before production use.
  • Consider an instruction-tuned variant if you need chat-style behavior out of the box.

How to Get Started

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

REPO_ID = "KavinduHansaka/mistral-7b-v0.1-8bit"

tok = AutoTokenizer.from_pretrained(REPO_ID, use_fast=True)
if tok.pad_token is None:
    tok.pad_token = tok.eos_token
tok.padding_side = "left"  # or "right"

model = AutoModelForCausalLM.from_pretrained(
    REPO_ID,
    device_map="auto",
    load_in_8bit=True,   # bitsandbytes 8-bit
    torch_dtype="auto"
)

gen = pipeline("text-generation", model=model, tokenizer=tok, device_map="auto")
print(gen("Explain attention mechanisms simply.", max_new_tokens=200)[0]["generated_text"])

Suggested dependencies

pip install -U "transformers>=4.41" "accelerate>=0.28" "bitsandbytes>=0.43.3" sentencepiece

Training Details

  • Training data / procedure: N/A (no additional training; this is the original Mistral-7B v0.1 loaded with 8-bit weights).
  • Precision: Runtime bnb.int8 quantization for inference (not GPTQ/AWQ/QLoRA).
  • Speeds & Sizes: Depends on GPU; typical 8–12 GB VRAM allows single-prompt generations with moderate lengths.

Technical Specifications

Architecture

  • Decoder-only Transformer, ~7B parameters (see base model for specifics).

Compute

  • Hardware: Single NVIDIA GPU recommended (≥8 GB VRAM).
  • Software: Python, transformers, accelerate, bitsandbytes, sentencepiece, CUDA toolkit/driver for your GPU.

Citation

If you use this model, please cite Mistral AI and the libraries you rely on:

@software{mistral_7b_v0_1,
  title  = {Mistral 7B v0.1},
  author = {Mistral AI},
  year   = {2023},
  url    = {https://huggingface.co/mistralai/Mistral-7B-v0.1}
}
Downloads last month
16
Safetensors
Model size
7.24B params
Tensor type
F32
·
F16
·
I8
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for KavinduHansaka/mistral-7b-v0.1-8bit

Quantized
(192)
this model