Safetensors
phi3
custom_code

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

🧾 Model Overview

  • Model Name: Phi-4-mini-instruct Model for Grounding
  • Task: Claim-Document Consistency Classification (Grounding)
  • Architecture: Architecture: Full-parameter fine-tuned (SFT) version of microsoft/phi-4-mini-instruct
  • Framework: PyTorch (Hugging Face Transformers)
  • Input Type: Instruction-style text prompt
  • Output Type: SFT classification (yes -> grounded / no -> ungrounded)

🎯 Intended Use

This model is designed to determine whether a natural language claim is consistent with a given document.

Example Applications:

  • βœ… Fact-checking pipelines
  • βœ… RAG output verification
  • βœ… QA validation systems
  • βœ… News and document analysis
  • βœ… Source-grounded generation tasks

🧩 Input Format

The model expects an instruction-formatted prompt with both the document and the claim inserted:

πŸ”€ Prompt Template:

PROMPT_TEMPLATE = '''
You are tasked with determining whether a given claim is consistent with the information provided in a document. Consistency means that all information in the claim is supported by the document. If any part of the claim contradicts or is not substantiated by the document, it should be considered inconsistent.

Analyze the claim in relation to the information provided in the document. Consider the following:
1. Does the document explicitly support all parts of the claim?
2. Is there any information in the claim that contradicts the document?
3. Does the claim contain any details not mentioned in the document?

Before providing your reasoning, give your final answer as either "Yes" (the claim is consistent with the document) or "No" (the claim is not consistent with the document). The reasoning should follow the final answer.

The answer should begin with a single word: "Yes" or "No".

---

First, carefully read the following document:

<DOCUMENT>
{doc}
</DOCUMENT>

Now, consider this claim:

<CLAIM>
{claim}
</CLAIM>

What is your answer?'''

πŸ“Š Evaluation [BAcc]

Qualifire benchmarks link: https://huggingface.co/datasets/qualifire/grounding-benchmark

Aggrefact benchmarks link: https://huggingface.co/datasets/lytang/LLM-AggreFact

Results:

Model avg Latency Params AggreFact-CNN AggreFact-XSum TofuEval-MediaS TofuEval-MeetB Wice Reveal ClaimVerify FactCheck-GPT grounding-benchmark-general grounding-benchmark-logical grounding-benchmark-temporal grounding-benchmark-mathematical Creator
Paladin-large 83.48 ~0.29sec 14B 64.01 74.77 74.76 79.56 78.63 90.77 80.14 79.96 91.97 98.2 91 98 Qualifire
Gemini-2.5-flash 80.59 ~2sec - 69.67 70.92 76.5 82.06 80.25 89.18 77.67 74.91 75.07 88.9 92 90 Google
Gemini-2.0-flash 79.95 ~2sec - 71.77 71.46 75.6 77.76 81.81 90.93 79.47 75.11 79.52 95 90 71 Google
Paladin-mini 79.31 ~0.06sec 3.8B 59.81 71.05 69.25 71.91 71.63 89.44 75.32 76.26 91.97 97.1 82 96 Qualifire
Bespoke-MiniCheck-7B 77.87 ~0.1sec 7B 65.5 77.8 76 78.3 83 88 75.3 77.7 84.02 92.8 90 46 MiniCheck

Interested in Paladin-large? Reach out to us

βš™οΈ How to Use

Load the model

The model returns a label and a score using Hugging Face's text-classification pipeline:

model =  AutoModelForCausalLM.from_pretrained(
  name_of_model,
  torch_dtype=torch.bfloat16,
  attn_implementation="flash_attention_2",
  cache_dir="model/",
  revision =model_commit,
   device_map='auto',
)

tokenizer = AutoTokenizer.from_pretrained(name_of_model)

pipe = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    max_new_tokens=1,
    return_full_text=False,
    temperature=0.0,
    do_sample=False
)

Example:

doc_example = "The office's opening hours are from 9 AM to 6 PM every day."
claim_example =  "The office opens at 10 AM on Sunday."

example_prompt_with_inputs = PROMPT_TEMPLATE.format(doc=doc_example, claim=claim_example)

prompt = 'example'
messages = [
    {"role": "user", "content": example_prompt_with_inputs},
]

result = pipe(messages, do_sample=False)
label_pred = result[0]['generated_text'].strip()
print(label_pred)

Output:

'Yes'

Output:

{'label': 'grounded', 'score': 0.9949642419815063}


⚠️ Known Limitations

  • Prompt Format Dependence: Performance is highly dependent on the specified PROMPT_TEMPLATE.
  • Limited Reasoning Depth: Complex multi-hop grounding may degrade performance
  • Label Ambiguit: Model does not verify truth, only consistency with the document

πŸ“œ Ethical Considerations

  • Misinformation Risk: Model assesses consistency with the document, not factual truth. The document itself could contain misinformation.
  • Responsible Use: Requires human oversight for critical applications.
  • Data Privacy: Be mindful of data handling when using sensitive inputs.

This is a version of the approach described in the paper, "Paladin‑mini: A Compact and Efficient Grounding Model Excelling in Real‑World Scenarios"

@misc{ivry2025paladinmini,
  title        = {Paladin‑mini: A Compact and Efficient Grounding Model Excelling in Real‑World Scenarios},
  author       = {Dror Ivry and Oran Nahum},
  year         = {2025},
  eprint       = {2506.20384},
  archivePrefix= {arXiv},
  primaryClass = {cs.AI}
}
Downloads last month
40
Safetensors
Model size
3.84B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Datasets used to train qualifire/context-grounding-paladin-mini

Collection including qualifire/context-grounding-paladin-mini