FM-1976/Gaia-LLM-8B-Q4_K_M-GGUF

This model was converted to GGUF format from my2000cup/Gaia-LLM-8B using llama.cpp via the ggml.ai's GGUF-my-repo space. Refer to the original model card for more details on the model.

This model is a fine-tuned version of ../pretrained/Qwen3-4B on the wikipedia_zh, petro_books, datasets001, the datasets002, the datasets003, the datasets004 and the datasets006 datasets.

Model description

Gaia-Petro-LLM is a large language model specialized in the oil and gas industry, fine-tuned from Qwen/Qwen3-4B. It was further pre-trained on a curated 20GB corpus of petroleum engineering texts, including technical documents, academic papers, and domain literature. The model is designed to support domain experts, researchers, and engineers in petroleum-related tasks, providing high-quality, domain-specific language understanding and generation.

Model Details

Base Model: Qwen/Qwen3-8B Domain: Oil & Gas / Petroleum Engineering Corpus Size: ~20GB (petroleum engineering) Languages: Primarily Chinese; domain-specific English supported Repository: my2000cup/Gaia-LLM-8B

Intended uses & limitations

Technical Q&A in petroleum engineering Document summarization for oil & gas reports Knowledge extraction from unstructured domain texts Education & training in oil & gas technologies

Not suitable for general domain tasks outside oil & gas. May not be up to date with the latest industry developments (post-2023). Not to be used for critical, real-time decision-making without expert review.

Training and evaluation data

The model was further pre-trained on an in-house text corpus (~20GB) collected from:

Wikipedia (Chinese, petroleum-related entries) Open petroleum engineering books and literature Technical standards and manuals

Context Windows

From the original tokenizer configuration, the model inherited the same conext window of 128k tokens

"model_max_length": 131072

Unless you have a really powerfull GPU with a lot of VRAM, when running the model with llama.cpp try first a smaller context window of 12k, or offload only a part of the GPU layers to keep more VRAM available for the context window

Use with llama.cpp

Install llama.cpp through brew (works on Mac and Linux)

brew install llama.cpp

Invoke the llama.cpp server or the CLI.

CLI:

llama-cli --hf-repo FM-1976/Gaia-LLM-8B-Q4_K_M-GGUF --hf-file gaia-llm-8b-q4_k_m.gguf -p "The meaning to life and the universe is"

Server:

llama-server --hf-repo FM-1976/Gaia-LLM-8B-Q4_K_M-GGUF --hf-file gaia-llm-8b-q4_k_m.gguf -c 2048

Note: You can also use this checkpoint directly through the usage steps listed in the Llama.cpp repo as well.

Step 1: Clone llama.cpp from GitHub.

git clone https://github.com/ggerganov/llama.cpp

Step 2: Move into the llama.cpp folder and build it with LLAMA_CURL=1 flag along with other hardware-specific flags (for ex: LLAMA_CUDA=1 for Nvidia GPUs on Linux).

cd llama.cpp && LLAMA_CURL=1 make

Step 3: Run inference through the main binary.

./llama-cli --hf-repo FM-1976/Gaia-LLM-8B-Q4_K_M-GGUF --hf-file gaia-llm-8b-q4_k_m.gguf -p "The meaning to life and the universe is"

or llama-server with 12k context window

./llama-server --hf-repo FM-1976/Gaia-LLM-8B-Q4_K_M-GGUF --hf-file gaia-llm-8b-q4_k_m.gguf -c 12288

Use the flag -ngl XX where XX is the aomunt of layers to move to the GPU

example of local run of the model, considering you already downloaded the GGUF file This command will load on GPU all the 36 layers

better you start with a smaller number though...

./llama-server -m gaia-llm-8b-q4_k_m.gguf -c 12288 -ngl 36

About the original model

Qwen3-8B

Qwen3 Highlights

Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Built upon extensive training, Qwen3 delivers groundbreaking advancements in reasoning, instruction-following, agent capabilities, and multilingual support, with the following key features:

Uniquely support of seamless switching between thinking mode (for complex logical reasoning, math, and coding) and non-thinking mode (for efficient, general-purpose dialogue) within single model, ensuring optimal performance across various scenarios.
Significantly enhancement in its reasoning capabilities, surpassing previous QwQ (in thinking mode) and Qwen2.5 instruct models (in non-thinking mode) on mathematics, code generation, and commonsense logical reasoning.
Superior human preference alignment, excelling in creative writing, role-playing, multi-turn dialogues, and instruction following, to deliver a more natural, engaging, and immersive conversational experience.
Expertise in agent capabilities, enabling precise integration with external tools in both thinking and unthinking modes and achieving leading performance among open-source models in complex agent-based tasks.
Support of 100+ languages and dialects with strong capabilities for multilingual instruction following and translation.

Model Overview

Qwen3-8B has the following features:

Type: Causal Language Models
Training Stage: Pretraining & Post-training
Number of Parameters: 8.2B
Number of Paramaters (Non-Embedding): 6.95B
Number of Layers: 36
Number of Attention Heads (GQA): 32 for Q and 8 for KV
Context Length: 32,768 natively and 131,072 tokens with YaRN.

For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our blog, GitHub, and Documentation.

FM-1976
/

Gaia-LLM-8B-Q4_K_M-GGUF