This model is a fine-tuned version of mistralai/Mistral-7B-v0.1
for generating SPARQL queries from English natural language questions, specifically targeting the Wikidata knowledge graph.
Model Details
Model Description
It was fine-tuned using QLoRA with 4-bit quantization. It takes an English natural language question and corresponding entity/relationship context as input and aims to produce a SPARQL query for Wikidata. This model is part of experiments investigating continual multilingual pre-training.
- Developed by: Julio Cesar Perez Duran
- Funded by : DFKI
- Model type: Decoder-only Transformer-based language model
- Language: en (English)
- License: mit
- Finetuned from model:
mistralai/Mistral-7B-v0.1
Bias, Risks, and Limitations
- Context Reliant: Performance relies on the provided entity/relationship context mappings.
- Output Format: generates extraneous text after the SPARQL query, requiring post-processing (extraction of content within
```sparql ... ```
delimiters).
How to Get Started with the Model
The following Python script provides an example of how to load the model and tokenizer to generate a SPARQL query.
import torch
from transformers import AutoTokenizer, BitsAndBytesConfig
from peft import AutoPeftModelForCausalLM
import re
import json
# Model ID for the Mistral English v2 model
model_id = "julioc-p/mistral_txt_sparql_en_v2"
# Configuration for 4-bit quantization
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
)
model = AutoPeftModelForCausalLM.from_pretrained(
model_id,
quantization_config=bnb_config,
device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_id)
if tokenizer.pad_token is None:
tokenizer.pad_token = tokenizer.eos_token
model.config.pad_token_id = tokenizer.pad_token_id
# SPARQL extraction function
def extract_sparql(text):
code_block_match = re.search(r"```(?:sparql)?\s*(.*?)\s*```", text, re.DOTALL | re.IGNORECASE)
if code_block_match:
text_to_search = code_block_match.group(1)
else:
# v2 models wrap output in ```sparql ... ``` so this is the main path
text_to_search = text
match = re.search(r"(SELECT|ASK|CONSTRUCT|DESCRIBE).*?\}", text_to_search, re.DOTALL | re.IGNORECASE)
if match:
return match.group(0).strip()
return ""
question = "Who was Barnard College's American female employee?"
example_context_json_str = '''
{
"entities": {
"Barnard College": "Q167733",
"American": "Q30",
"female": "Q6581072",
"employee": "Q5"
},
"relationships": {
"instance of": "P31",
"employer": "P108",
"gender": "P21",
"country of citizenship": "P27"
}
}
'''
system_message_template = """You are an expert text to SparQL query translator. Users will ask you questions in English and you will generate a SparQL query based on the provided context encloses in ```sparql <respose_query>```.
CONTEXT:
{context}"""
# Format the system message with the actual context
formatted_system_message = system_message_template.format(context=example_context_json_str)
chat_template = [
{"role": "system", "content": formatted_system_message},
{"role": "user", "content": question},
]
inputs = tokenizer.apply_chat_template(
chat_template,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt"
).to(model.device)
# Generate the output
with torch.no_grad():
outputs = model.generate(
input_ids=inputs.input_ids,
attention_mask=inputs.attention_mask,
max_new_tokens=512,
do_sample=True,
temperature=0.7,
top_p=0.9,
pad_token_id=tokenizer.pad_token_id
)
# Decode only the generated part
generated_text_full = tokenizer.decode(outputs[0], skip_special_tokens=True)
assistant_response_part = generated_text_full.split("<|im_start|>assistant")[-1].split("<|im_end|>")[0].strip()
cleaned_sparql = extract_sparql(assistant_response_part)
print(f"Question: {question}")
print(f"Context: {example_context_json_str}")
print(f"Generated SPARQL: {cleaned_sparql}")
print(f"Assistant's Raw Response: {assistant_response_part}")
Training Data
The model was fine-tuned on a subset of the julioc-p/Question-Sparql
dataset. 80,000 English examples for training, which included a context
field containing Wikidata entity and relationship ID mappings.
Training Hyperparameters
The following hyperparameters were used for fine-tuning:
- LoRA Configuration (v2 models):
r
(LoRA rank): 256lora_alpha
: 128lora_dropout
: 0.05target_modules
: "all-linear"
- Training Arguments (v2 models):
num_train_epochs
: 3- Effective batch size: 6
optim
: "adamw_torch_fused"learning_rate
: 2e-4fp16
: Truemax_grad_norm
: 0.3warmup_ratio
: 0.03lr_scheduler_type
: "constant"packing
: True- NEFTune
noise_alpha
: 5
- BitsAndBytesConfig (v2 models):
load_in_4bit
: Truebnb_4bit_quant_type
: "nf4"bnb_4bit_compute_dtype
:torch.float16
bnb_4bit_use_double_quant
: True
Speeds, Sizes, Times
- Took approx. 19-20 hours on a single NVIDIA V100 GPU.
Evaluation
Testing Data, Factors & Metrics
Testing Data
- QALD-10 test set (English): Standardized benchmark. 394 English questions were evaluated for this model.
- v2 Test Set (English): 10,000 English held-out examples from the
julioc-p/Question-Sparql
dataset, including context.
Metrics
QALD standard macro-averaged F1-score, Precision, and Recall. Non-executable queries result in P, R, F1 = 0.
Results
On QALD-10 (English, N=394):
- Macro F1-Score: 0.2846
- Macro Precision: 0.6612
- Macro Recall: 0.2844
- Executable Queries: 99.75% (393/394)
- Correctness (Exact Match + Both Empty): 27.41% (108/394)
- Correct (Exact Match): 25.89% (102/394)
- Correct (Both Empty): 1.52% (6/394)
On v2 Test Set (English, N=10000):
- Macro F1-Score: 0.8285
- Macro Precision: 0.9104
- Macro Recall: 0.8292
- Executable Queries: 99.63% (9963/10000)
- Correctness (Exact Match + Both Empty): 82.73% (8273/10000)
- Correct (Exact Match): 74.55% (7455/10000)
- Correct (Both Empty): 8.18% (818/10000)
Environmental Impact
- Hardware Type: 1 x NVIDIA V100 32GB GPU
- Hours used: Approx. 19-20 hours for fine-tuning.
- Cloud Provider: DFKI HPC Cluster
- Compute Region: Germany
- Carbon Emitted: Approx. 2.96 kg CO2eq.
Technical Specifications
Compute Infrastructure
Hardware
- NVIDIA V100 GPU (32 GB RAM)
- Approx. 60 GB system RAM
Software
- Slurm, NVIDIA Enroot, CUDA 11.8.0
- Python, Hugging Face
transformers
,peft
(0.13.2),bitsandbytes
,trl
, PyTorch.
More Information
- Thesis GitHub: https://github.com/julioc-p/cross-lingual-transferability-thesis
- Dataset: https://huggingface.co/datasets/julioc-p/Question-Sparql
Framework versions
- PEFT 0.13.2
- Transformers (
4.39.3
) - BitsAndBytes (
0.43.0
) - trl (
0.8.6
) - PyTorch (
torch==2.1.0
)
- Downloads last month
- 28
Model tree for julioc-p/mistral_txt_sparql_en_v2
Base model
mistralai/Mistral-7B-v0.1