Model Card for Model ID

Model Description

This model card describes Llama-3-8B-CR-DPO, Lora-fine-tune using Confidence Reasoning Direct Preference Optimization (CR-DPO) as introduced in the paper Enhancing Large Language Model’ Situated Faithfulness To External Contexts. CR-DPO calibrates LLMs’ trust in external contexts by aligning their confidence in internal knowledge with the reliability of external information.

Method

The model learns verbalized confidence reasoning by optimizing preferences between pairs of self-sampled reasoning paths. When the model’s internal answer is incorrect, it is presented with a correct external context to reason why the context is right and its answer is wrong, forming a preferred reasoning path. Conversely, the model is misled with an incorrect external context, generating a rejected reasoning path. The same process is applied when the model’s internal answer is correct, comparing internal and external reasoning. To enhance reasoning diversity, dual sampling is used, generating two reasoning path pairs with varied prompts and in-context examples. Additionally, negative log-likelihood is incorporated into the DPO loss to further optimize reasoning. The training data is constructed from TriviaQA, NaturalQA, PopQA, and RedditQA.

Other Info

Finetuned from model: Llama-3-8B-Instruct
Repository: Github Link
Paper: Enhancing Large Language Model’ Situated Faithfulness To External Contexts

BibTeX:

@article{Huang2024EnhancingLL,
  title={Enhancing Large Language Models' Situated Faithfulness to External Contexts},
  author={Yukun Huang and Sanxing Chen and Hongyi Cai and Bhuwan Dhingra},
  journal={ArXiv},
  year={2024},
  volume={abs/2410.14675},
  url={https://api.semanticscholar.org/CorpusID:273482717}
}