Llama3-8B-Instruct Fine-tuned for QED (Question-Explanation-Data)
Model Description
This model is a fine-tuned version of meta-llama/Meta-Llama-3-8B-Instruct specifically adapted for the QED (Question-Explanation-Data) task. The model has been trained to provide structured explanations for question answering by generating three key components simultaneously: direct answers, supporting sentences, and referential entity mappings.
Task Overview
The QED task, introduced in "A Framework and Dataset for Explanations in Question Answering", requires models to:
- Answer Extraction: Identify the shortest span from a passage that directly answers a given question
- Evidence Selection: Select the single sentence from the passage that best entails or implies the answer
- Referential Mapping: Establish connections between entities mentioned in the question and their corresponding references in the selected sentence
Fine-tuning Details
- Base Model: meta-llama/Meta-Llama-3-8B-Instruct
- Fine-tuning Method: LoRA (Low-Rank Adaptation) with rank=16, alpha=32
- Quantization: 4-bit quantization for memory efficiency
- Training Strategy: Few-shot learning with "random_two" example prompting
- Training Data: Curated subset of QED training examples
- Output Format: Structured JSON containing answer, selected_sentence, and referential_equalities
Performance Improvements
Significant improvements over the base model on QED evaluation metrics:
Metric | Base Model (Zero-shot) | Fine-tuned Model | Improvement |
---|---|---|---|
Exact Match Accuracy | 0.9% | 11.8% | +10.9% |
Answer Accuracy | 82.0% | 86.4% | +4.4% |
All Mention F1 | 5.5% | 38.4% | +32.9% |
Question Mention F1 | 6.0% | 47.6% | +41.6% |
Context Mention F1 | 5.0% | 29.2% | +24.2% |
Results based on 0.5 F1 overlap threshold, non-strict matching
Training Code & Methodology
This model was trained using our comprehensive QED fine-tuning framework available on GitHub:
π QED Fine-Tuning Framework
Usage
The model expects input in a specific format and outputs structured JSON:
# Input format
prompt = """
Title: [Document Title]
Question: [Your Question]
Passage: [Context Passage]
You are an expert at extracting answers and structured explanations from text.
Your response MUST be **valid JSON only** (no extra commentary).
Task
====
Given:
β’ a **title** for the passage,
β’ a **question** about the passage, and
β’ the **context passage** itself,
produce an explanation object with three parts:
1. "answer" β the **shortest span** from the passage that fully answers the question.
2. "selected_sentence" β the **single sentence** in the passage that entails or implies the answer.
3. "referential_equalities" β a list of mappings between phrases in the question and phrases in the selected sentence
that refer to the **same real-world entity/event**.
β’ Each mapping has two keys:
- "question_reference": the exact phrase from the question (**must be a contiguous substring from the question,
not from the context or title**).
- "sentence_reference": the exact phrase from the selected sentence (**must be a contiguous substring from the selected sentence,
not from the question or title**), or "" (empty string if the entire sentence is the referent).
βΈ Use **""** for "sentence_reference" when the entity/event is not named by any specific phrase in the sentence β
i.e. the entire sentence acts as the referent (a *bridge* to the whole sentence).
This corresponds to the (start = end = -1) convention in the QED dataset.
Output format
=============
Return **only** JSON in this exact schema:
{
"answer": "<string from passage>",
"selected_sentence": "<string from passage>",
"referential_equalities": [
{
"question_reference": "<string from question only>",
"sentence_reference": "<string from selected_sentence only, or "">",
"bridge": "<false if not a bridge; otherwise, a string explaining the bridge connection, e.g., 'in', 'for', 'of', 'at', 'on'>"
}
...
]
}
"""
# Expected output format
{
"answer": "<shortest span from passage>",
"selected_sentence": "<sentence that entails the answer>",
"referential_equalities": [
{
"question_reference": "<entity from question>",
"sentence_reference": "<corresponding entity from sentence>",
"bridge": false
}
]
}
Evaluation
Evaluated on the QED development set with official metrics across multiple overlap thresholds (0.5-0.9). The model shows consistent improvements in all measured aspects of the QED task, particularly excelling at entity reference mapping and answer extraction.
Training Details
- Dataset: QED training subset with careful example curation
- Learning Rate: 5e-6 with warmup ratio of 0.2
- Batch Size: Effective batch size of 16 through gradient accumulation
- Optimizer: Paged AdamW 8-bit for memory efficiency
- Evaluation: Multi-threshold validation (0.5-0.9 F1 overlap)
- Epochs: 3 Epochs
Applications
This model is particularly suitable for:
- Educational question answering systems requiring explanations
- Research applications needing interpretable QA
- Systems where answer provenance and entity tracking are important
- Building more transparent and accountable AI assistants
Citation
Please cite the original QED work when using this model:
@article{lamm2020qed,
title={QED: A Framework and Dataset for Explanations in Question Answering},
author={Lamm, Matthew and Palomaki, Jennimaria and Alberti, Chris and Andor, Daniel and Chen, Eunsol and Devlin, Jacob and Michael, Julian},
journal={arXiv preprint arXiv:2010.13806},
year={2020}
}
Model tree for DenisRz/llama3_8b_instruct_qed
Base model
meta-llama/Meta-Llama-3-8B-Instruct