Model Card for SPARQL-verifier

Model Details

Model Description

This model was trained using synthetic correct and intentionally incorrect natural language translation of SPARQL queries from the LC-QuAD 2.0 dataset.

It can be used as a verifier and quality metric for pairs of SPARQL query and Natural Language Translation. It predicts a correctness score between 0 (wrong) and 1 (correct) indicating how likely it assesses the translation be semantically similar to the SPARQL query.

Details can be found in our paper: Q-NL Verifier: Leveraging Synthetic Data for Robust Knowledge Graph Question Answering

Input formating

The SPARQL query expected by the model must be formatted in the following way: All occurring IRIs/named entities/predicates must be enclosed in square brackets and should have and interpretable name (e.g. instead of [pred12] it should be[fatherOf]) in order to be meaningful for the model. Here is an example:

SPARQL: SELECT DISTINCT ?uri WHERE { [Robert F. Kennedy] [child] ?daughter . ?daughter  [sex or gender] [female] . ?daughter  [spouse] ?uri .}
Natural Language Translation: Who is the spouse of robert kennedys female child ?

Usage with SentenceTransformers

The usage becomes easier when you have SentenceTransformers installed. Then, you can use the pre-trained models like this:

from sentence_transformers import CrossEncoder
model = CrossEncoder('model_name', max_length=512)
scores = model.predict([('SPARQL query', 'Natural Language Translation')])

Developed by: Tim Schwabe
Finetuned from model : joe32140/ModernBERT-base-msmarco

Model Sources [optional]

Repository: https://github.com/TimEricSchwabe/Q-NL-Verifier/tree/main