YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Model Card for Model ID

Model Description

This model card describes Llama-3-8B-CR-DPO, Lora-fine-tune using Confidence Reasoning Direct Preference Optimization (CR-DPO) as introduced in the paper Enhancing Large Language Model’ Situated Faithfulness To External Contexts. CR-DPO calibrates LLMs’ trust in external contexts by aligning their confidence in internal knowledge with the reliability of external information.

Method

The model learns verbalized confidence reasoning by optimizing preferences between pairs of self-sampled reasoning paths. When the model’s internal answer is incorrect, it is presented with a correct external context to reason why the context is right and its answer is wrong, forming a preferred reasoning path. Conversely, the model is misled with an incorrect external context, generating a rejected reasoning path. The same process is applied when the model’s internal answer is correct, comparing internal and external reasoning. To enhance reasoning diversity, dual sampling is used, generating two reasoning path pairs with varied prompts and in-context examples. Additionally, negative log-likelihood is incorporated into the DPO loss to further optimize reasoning. The training data is constructed from TriviaQA, NaturalQA, PopQA, and RedditQA.

Other Info

BibTeX:

@article{Huang2024EnhancingLL,
  title={Enhancing Large Language Models' Situated Faithfulness to External Contexts},
  author={Yukun Huang and Sanxing Chen and Hongyi Cai and Bhuwan Dhingra},
  journal={ArXiv},
  year={2024},
  volume={abs/2410.14675},
  url={https://api.semanticscholar.org/CorpusID:273482717}
}
Downloads last month
12
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.