Text Generation
Transformers
Safetensors
English
qwen2
conversational
text-generation-inference

Qwen2.5-7B-Instruct-CARE

     

Model Card for Qwen2.5-7B-Instruct-CARE

Model Description

CARE-Qwen2.5-7B-Instruct is a 7.61B parameter instruction-tuned language model based on Qwen2.5-7B-Instruct, enhanced with native retrieval-augmented reasoning capabilities through the CARE (Context-Aware Retrieval-Enhanced reasoning) framework. This model has been specifically trained to improve context fidelity and reduce hallucinations by teaching the model to explicitly integrate in-context evidence within its reasoning process.

Key Features:

  • Native retrieval-augmented reasoning: Dynamically identifies and incorporates relevant evidence from input context
  • Improved context fidelity: Significantly better adherence to provided context, especially when it contradicts parametric knowledge
  • Enhanced multi-hop reasoning: Superior performance on complex reasoning tasks requiring evidence integration
  • Structured reasoning output: Generates reasoning chains with explicit evidence citations using <think> and <retrieval> tags

Model Details

  • Model Type: Causal Language Model (Enhanced with Retrieval-Augmented Reasoning)
  • Base Model: Qwen2.5-7B-Instruct
  • Parameters: 7.61B total, 6.53B non-embedding
  • Architecture: Transformer with RoPE, SwiGLU, RMSNorm, and Attention QKV bias
  • Context Length: 131,072 tokens (full), 8,192 tokens (generation)
  • Training Framework: Two-phase training (SFT + Reinforcement Learning with GRPO)

Training Process

The model was trained using a novel two-phase approach:

Phase 1 - Supervised Fine-Tuning (SFT):

  • Dataset: 7,739 instances from HotpotQA with retrieval-augmented reasoning chains
  • Purpose: Establish evidence integration patterns and reasoning format
  • Training: 3 epochs with LoRA (r=8, α=16), AdamW optimizer

Phase 2 - Reinforcement Learning:

  • Method: Group Relative Policy Optimization (GRPO)
  • Curriculum Learning: Gradual transition from DROP (easy) to MS MARCO (hard)
  • Rewards: Accuracy + Format + Retrieval consistency
  • Training: 350 steps with multi-aspect reward optimization

System Prompt

The model uses an enhanced system prompt that enables structured reasoning with evidence retrieval:

You are Qwen, created by Alibaba Cloud. You are a helpful assistant. You FIRST think about the reasoning process as an internal monologue and then provide the final answer. The reasoning process MUST BE enclosed within <think> </think> tags. WITHIN the thinking process, make reference to the relevant texts in the prompt that provide critical information to move the reasoning process forward. The referenced texts MUST BE enclosed within <retrieval> </retrieval> tags, and MUST BE placed within the reasoning process only. The final answer MUST BE put at the end of the response after "Answer:".

Note: This system prompt is automatically applied when using the default chat template.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "sheryc/Qwen2.5-7B-Instruct-CARE"

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(model_name)

# Example usage
context = """John went to the movies with his mom last week. They watched the latest superhero movie, which was quite popular. The ticket price was $15. According to the local cinema's website, ticket prices range from $10 to $12 for regular screenings and from $13 to $16 for special releases."""

question = "Was the ticket price John's mom paid for the movie reasonable?"

messages = [
    {"role": "user", "content": f"{question}\n\nContext:{context}"}
]

tokenized_chat = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt"
)

generated_ids = model.generate(tokenized_chat.to(model.device), max_new_tokens=512)
output_text = tokenizer.decode(outputs[0])

Expected Output Format:

<think>
The context states John watched the latest superhero movie. <retrieval>The ticket price was $15.</retrieval> The context provides price ranges: <retrieval>ticket prices range from $10 to $12 for regular screenings and from $13 to $16 for special releases.</retrieval> Since this was a popular latest superhero movie, it likely qualifies as a special release. Therefore, the $15 price falls within the $13-$16 range for special releases.
</think>

Answer: Yes, the ticket price was reasonable.

Training Data

  • SFT Phase: HotpotQA with labeled supporting facts (7,739 instances)
  • RL Phase:
    • DROP dataset (77,409 training instances) - Easy curriculum phase
    • MS MARCO - Hard curriculum phase
  • Evaluation: LongBench, CofCA, and other QA benchmarks

License

Please refer to the original Qwen2.5 license terms.

Citation

@inproceedings{wang2025care,
  title={Improving Context Fidelity via Native Retrieval-Augmented Reasoning},
  author={Wang, Suyuchen and Wang, Jinlin and Wang, Xinyu and Li, Shiqi and Tang, Xiangru and Hong, Sirui and Chang, Xiao-Wen and Wu, Chenglin and Liu, Bang},
  booktitle={Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing},
  year={2025}
}

@misc{qwen2.5,
    title = {Qwen2.5: A Party of Foundation Models},
    url = {https://qwenlm.github.io/blog/qwen2.5/},
    author = {Qwen Team},
    month = {September},
    year = {2024}
}

Contact

For questions about the model or to report issues, please visit the CARE project homepage or contact the authors.

Downloads last month
20
Safetensors
Model size
7.62B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sheryc/Qwen2.5-7B-Instruct-CARE

Base model

Qwen/Qwen2.5-7B
Finetuned
(2718)
this model

Datasets used to train sheryc/Qwen2.5-7B-Instruct-CARE

Collection including sheryc/Qwen2.5-7B-Instruct-CARE