Text Generation
Transformers
Safetensors
English
llama
conversational
text-generation-inference

Llama-3.1-8B-Instruct-CARE

     

Model Card for Llama-3.1-8B-Instruct-CARE

Model Description

Llama-3.1-8B-Instruct-CARE is an 8B parameter instruction-tuned language model based on meta-llama/Meta-Llama-3.1-8B-Instruct, enhanced with native retrieval-augmented reasoning capabilities through the CARE (Context-Aware Retrieval-Enhanced reasoning) framework. This model has been specifically trained to improve context fidelity and reduce hallucinations by teaching the model to explicitly integrate in-context evidence within its reasoning process.

Key Features:

  • Native retrieval-augmented reasoning: Dynamically identifies and incorporates relevant evidence from input context
  • Improved context fidelity: Significantly better adherence to provided context, especially when it contradicts parametric knowledge
  • Enhanced multi-hop reasoning: Superior performance on complex reasoning tasks requiring evidence integration
  • Structured reasoning output: Generates reasoning chains with explicit evidence citations using <think> and <retrieval> tags

Model Details

  • Model Type: Causal Language Model (Enhanced with Retrieval-Augmented Reasoning)
  • Base Model: meta-llama/Meta-Llama-3.1-8B-Instruct
  • Parameters: 8B total
  • Architecture: Transformer with optimized architecture (GQA, RoPE)
  • Context Length: 128,000 tokens
  • Training Framework: Two-phase training (SFT + Reinforcement Learning with GRPO)

Training Process

The model was trained using a novel two-phase approach:

Phase 1 - Supervised Fine-Tuning (SFT):

  • Dataset: 7,739 instances from HotpotQA with retrieval-augmented reasoning chains
  • Purpose: Establish evidence integration patterns and reasoning format
  • Training: 3 epochs with LoRA (r=8, α=16), AdamW optimizer

Phase 2 - Reinforcement Learning:

  • Method: Group Relative Policy Optimization (GRPO)
  • Curriculum Learning: Gradual transition from DROP (easy) to MS MARCO (hard)
  • Rewards: Accuracy + Format + Retrieval consistency
  • Training: 350 steps with multi-aspect reward optimization

System Prompt

The model uses an enhanced system prompt that enables structured reasoning with evidence retrieval:

You are a helpful assistant. You FIRST think about the reasoning process as an internal monologue and then provide the final answer. The reasoning process MUST BE enclosed within <think> </think> tags. WITHIN the thinking process, make reference to the relevant texts in the prompt that provide critical information to move the reasoning process forward. The referenced texts MUST BE enclosed within <retrieval> </retrieval> tags, and MUST BE placed within the reasoning process only. The final answer MUST BE put at the end of the response after "Answer:".

Note: This system prompt is automatically applied when using the default chat template.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

model_name = "sheryc/Llama-3.1-8B-Instruct-CARE"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Example usage
context = """John went to the movies with his mom last week. They watched the latest superhero movie, which was quite popular. The ticket price was $15. According to the local cinema's website, ticket prices range from $10 to $12 for regular screenings and from $13 to $16 for special releases."""

question = "Was the ticket price John's mom paid for the movie reasonable?"

messages = [
    {"role": "user", "content": f"{question}\n\nContext:{context}"}
]

tokenized_chat = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
)

generated_ids = model.generate(tokenized_chat.to(model.device), max_new_tokens=512)
output_text = tokenizer.decode(generated_ids[0])

Expected Output Format:

<think>
The context states John watched the latest superhero movie. <retrieval>The ticket price was $15.</retrieval> The context provides price ranges: <retrieval>ticket prices range from $10 to $12 for regular screenings and from $13 to $16 for special releases.</retrieval> Since this was a popular latest superhero movie, it likely qualifies as a special release. Therefore, the $15 price falls within the $13-$16 range for special releases.
</think>

Answer: Yes, the ticket price was reasonable.

Training Data

  • SFT Phase: HotpotQA with labeled supporting facts (7,739 instances)
  • RL Phase:
    • DROP dataset (77,409 training instances) - Easy curriculum phase
    • MS MARCO - Hard curriculum phase
  • Evaluation: LongBench, CofCA, and other QA benchmarks

License

This model is licensed under the LLaMA 3.1 Community License. Please refer to the original LLaMA 3.1 license terms.

Citation

@inproceedings{wang2025care,
  title={Improving Context Fidelity via Native Retrieval-Augmented Reasoning},
  author={Wang, Suyuchen and Wang, Jinlin and Wang, Xinyu and Li, Shiqi and Tang, Xiangru and Hong, Sirui and Chang, Xiao-Wen and Wu, Chenglin and Liu, Bang},
  booktitle={Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
  year={2025}
}

And the original Llama 3 series paper.

Contact

For questions about the model or to report issues, please visit the CARE project homepage or contact the authors.

Downloads last month
18
Safetensors
Model size
8.03B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sheryc/Llama-3.1-8B-Instruct-CARE

Finetuned
(1781)
this model

Datasets used to train sheryc/Llama-3.1-8B-Instruct-CARE

Collection including sheryc/Llama-3.1-8B-Instruct-CARE