---
language: en
license: mit
tags:
- recommendation
- ranking
- personalization
- xgboost
- xgbranker
- recipe
- cold-start
datasets:
- your-username/recipe-cleaned-dataset
model-index:
- name: Personalized Recipe Ranking Models
  results:
  - task:
      type: recommendation
      name: Personalized Recipe Ranking
    dataset:
      name: Food.com (Cleaned)
      type: your-username/recipe-cleaned-dataset
    metrics:
      - type: ndcg@5
        value: 0.44
      - type: ndcg@10
        value: 0.44
---

# Model Card: Personalized Recipe Ranking Models

## Overview

This project implements a personalized recipe recommendation system using two model categories:

1. **Scratch-trained baseline**: A simple rule-based + embedding matching ranker trained on a synthetic preference dataset (no user-specific rules).  
2. **Rule-enhanced cold-start models**: Five separate XGBRanker models trained with more complex rule-based preference signals and user-specific interaction patterns (user1–user5).

The goal is to evaluate how different user profiles affect ranking behavior and recommendation diversity, even when overall NDCG scores are lower than the baseline.

---

## Model Category 1: Scratch-trained Baseline

### Purpose
Provide a simple cold-start recommendation baseline that matches ingredients and ranks recipes without personalization. It uses parent–child ingredient overlap and a few numeric features (e.g., protein, cost, cooking time).

### Data Sources
- Cleaned Food.com dataset (~180k recipes)
- 10,000 synthetic preference samples generated via uniform random selection

### Training Details
- Model type: **XGBRanker** (`objective='rank:pairwise'`)  
- Features: ~1000 numeric ingredient-parent ratio features + basic nutrition/time features  
- Train/test split: 80/20 (by recipe ID)  
- Evaluation metric: NDCG@5, NDCG@10

### Evaluation
The baseline achieves **very high NDCG scores (95%+)**, because training and evaluation rely on synthetic signals that align perfectly with the ranking structure.

### Intended Use
Serve as a **sanity check** and upper bound for ranking performance, not for deployment.

### Limitations
- Unrealistically clean preference structure  
- No user differentiation  
- Inflated metrics due to synthetic evaluation

---

## Model Category 2: Rule-enhanced Cold Start Models (User1–User5)

### Purpose
Capture user-specific dietary preferences and ranking heuristics using richer rule sets, leading to more diverse recommendation patterns across different users.

### Data Sources
- Cleaned Food.com dataset (~180k recipes)
- 5,000 cold-start synthetic interactions per user profile
- Additional unselected (negative) samples included to simulate realistic cold-start scenarios

### Model
- Model type: **XGBRanker** (scratch-trained)
- Training objective: `rank:pairwise`
- Feature space:
  - Ingredient-parent coverage ratios (~1000 parent nodes)
  - Nutrition features: protein, calories, cost, cooking time
  - User preference weights: protein/time/cost
  - Dietary tag filters and exclusion rules

### Training Setup
- Train/valid/test split: 70/15/15 by recipe ID per profile
- No fine-tuning between profiles; each profile trained independently
- Evaluation metric: NDCG@5 and NDCG@10

### Evaluation Results

| User Profile | NDCG@5 | NDCG@10 |
|-------------|--------|---------|
| user1       | 0.4400 | 0.4400  |
| user2       | 0.4342 | 0.4342  |
| user3       | 0.4179 | 0.4179  |
| user4       | 0.1651 | 0.1651  |
| user5       | 0.4607 | 0.4607  |

**Note:** User4 has very restrictive dietary preferences, resulting in very few matching recipes and inherently lower achievable NDCG.

:contentReference[oaicite:0]{index=0}:contentReference[oaicite:1]{index=1}:contentReference[oaicite:2]{index=2}:contentReference[oaicite:3]{index=3}:contentReference[oaicite:4]{index=4}

Although these NDCG values are lower than the baseline, this is expected for several reasons:

- The cold-start datasets contain a large proportion of unselected recipes, leading to sparse positive signals.  
- More complex preference rules increase variability and reduce alignment with NDCG’s single-label relevance assumptions.  
- The models now produce more differentiated ranking behaviors across user profiles, which aligns with the intended personalization goals.

---

## Model Selection Justification

- **XGBRanker** was chosen for all models due to its effectiveness on structured tabular data, fast training time, and compatibility with large feature spaces (1000+ ingredients).  
- The **baseline model** acts as a clean control, providing an upper bound on achievable NDCG under idealized preferences.  
- The **rule-enhanced models** trade some raw NDCG performance for greater personalization fidelity, which is critical in multi-user recommendation contexts.

---

## Evaluation Methodology

- Metric: NDCG@5 and NDCG@10 on held-out cold-start samples  
- Each user model evaluated independently  
- Negative samples retained to approximate real-world recommendation class imbalance

---

## Intended Uses and Limitations

**Intended Uses**
- Multi-profile recipe recommendation  
- Studying personalization behaviors under sparse feedback  
- Cold-start scenarios for new users

**Limitations**
- Synthetic user interactions do not perfectly reflect real-world feedback  
- NDCG is not well aligned with multi-rule personalization behavior  
- User4 performance is limited by scarcity of relevant recipes

---
## Risks and Bias

The models are trained on the Food.com dataset, which has known biases:
- **Regional bias**: Western and American cuisines dominate the dataset, leading to potential under-representation of other regions.
- **Popularity bias**: Highly rated or frequently interacted recipes are over-represented.
- **Cold-start leakage risk**: Although user interactions are synthetic, overlapping ingredient-parent structures between train/test may create mild information leakage, potentially inflating baseline metrics.

These biases may affect recommendation diversity and fairness across different cuisines or dietary groups.

---

## Cost and Latency

All models are based on **XGBRanker**, which runs efficiently on CPU:
- **Inference latency**: Approximately 1–5 ms per recipe for ranking (measured on a laptop CPU, single thread).
- **Training cost**: Training each user profile model on 5,000 interactions takes less than 2 minutes on CPU.

The approach is designed for real-time personalization in lightweight interfaces (e.g., Hugging Face Spaces).

---

## Usage Disclosure

**Intended Uses**
- Academic and educational research on personalized recommendation
- Cold-start personalization experiments
- Recipe recommendation for diverse dietary profiles

**Not Intended For**
- Medical or dietary decision-making
- Real-world deployment without additional bias mitigation
- High-stakes personalization where fairness across demographic groups is critical

---

## Citation

Tang, Xinxuan. Personalized Recipe Ranking Models. 2025.