OGAI-8x7B-8bit-32k: 8-bit Quantized Oil & Gas AI Model with Extended Context

Model Description

OGAI-8x7B-8bit-32k is an 8-bit quantized version of the OGAI-8x7B model with a 32K token context window. This quantized model retains most of the capabilities of the original model while significantly reducing memory requirements, making it ideal for deployment in environments with memory constraints.

The model is based on a LoRA fine-tuned Mixtral-8x7B model, specifically engineered for oil and gas applications with a focus on drilling processes. The quantization to 8-bit precision offers a balanced approach between model size reduction and maintaining high-quality outputs for domain-specific tasks.

Developed by: GainEnergy AI Team
Model type: 8-bit Quantized Causal Language Model (Instruction Following)
Language: English
License: MIT
Finetuned from model: GainEnergy/ogai-8x7b
Quantization method: 8-bit (Int8)
Context length: 32,768 tokens

Quantization Details

This model was quantized from the full-precision OGAI-8x7B using 8-bit quantization with the following configuration:

from transformers import BitsAndBytesConfig

quantization_config = BitsAndBytesConfig(
    load_in_8bit=True,
    llm_int8_enable_fp32_cpu_offload=True
)

The 8-bit quantization reduces the model size by approximately 4x compared to FP16, while preserving approximately 95-98% of the original model's performance on oil and gas engineering tasks.

Key Capabilities

Drilling Calculations & Optimization: Computes complex well trajectories, mud weight calculations, hydraulics, and casing designs.
Engineering Knowledge Integration: Retains knowledge from oil & gas technical literature, drilling reports, and proprietary engineering datasets.
Intelligent Document Processing: Supports knowledge retrieval for drilling workflows, regulatory compliance, and field operation manuals.
High-Context Reasoning: The extended 32K token context window allows the model to retain context across long drilling plans, technical discussions, and simulation outputs.

Usage

Basic Usage

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
import torch

# Configure quantization
quantization_config = BitsAndBytesConfig(
    load_in_8bit=True,
    llm_int8_enable_fp32_cpu_offload=True
)

# Load tokenizer and model
model_id = "GainEnergy/ogai-8x7b-8bit-32k"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    quantization_config=quantization_config
)

# Example prompt for drilling engineering
prompt = "Calculate the required casing depth for a well with a pore pressure of 12.5 ppg."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(**inputs, max_new_tokens=200)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)

Utilizing the Extended Context Window

To make the most of the 32K context window, you can input longer documents for analysis:

# Load long document (e.g., drilling report, technical specifications)
with open("long_drilling_report.txt", "r") as f:
    long_document = f.read()

# Append a question at the end
prompt = f"{long_document}\n\nBased on the above document, what are the key risk factors identified for this drilling operation?"

# Process with appropriate truncation to fit within context
inputs = tokenizer(
    prompt, 
    return_tensors="pt",
    truncation=True,
    max_length=32000  # Leave room for generation
).to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=768,
    do_sample=True,
    temperature=0.7,
    top_p=0.9
)

response = tokenizer.decode(outputs[0], skip_special_tokens=True)

Hardware Requirements

Due to the quantization, this model requires less GPU memory than the full-precision version:

Minimum: CUDA-capable GPU with 16GB VRAM
Recommended: CUDA-capable GPU with 24GB+ VRAM for comfortable usage with the 32K context window
System RAM: 32GB+

Limitations

Performance tradeoff: While 8-bit quantization preserves most capabilities, there may be slight reductions in accuracy for complex numerical computations compared to the full-precision model.
Domain specificity: The model is focused on oil and gas drilling engineering and may not perform well for other domains.
Expert validation: Outputs should be validated by domain experts before application in real-world engineering scenarios.
Knowledge cutoff: The model's knowledge is limited to data available up to 2025.

Comparison with Other Variants

Model Variant	Precision	Context Length	Memory Requirements	Performance Retention	Ideal Use Case
OGAI-8x7B	Full (16-bit)	32K	64GB+ VRAM	100% (Baseline)	High-precision engineering calculations
OGAI-8x7B-8bit-32k	8-bit	32K	16-24GB VRAM	~95-98%	Balanced approach for deployment
OGAI-8x7B-4bit	4-bit (NF4)	32K	8-16GB VRAM	~90-95%	Highly constrained environments

Citation

@article{ogai8x7b2025,
  title={OGAI-8x7B: An AI Model for Oil & Gas Drilling Engineering},
  author={GainEnergy AI Team},
  year={2025},
  publisher={Hugging Face Models}
}

Acknowledgments

This model builds upon the work of the OGAI-8x7B base model and extends its capabilities through quantization and context length expansion. Special thanks to the Mixtral team for the base architecture that powers this model.

GainEnergy
/

ogai-8x7b-8bit-32k

You need to agree to share your contact information to access this model