Model Card: Personalized Recipe Ranking Models

Overview

This project implements a personalized recipe recommendation system using two model categories:

  1. Scratch-trained baseline: A simple rule-based + embedding matching ranker trained on a synthetic preference dataset (no user-specific rules).
  2. Rule-enhanced cold-start models: Five separate XGBRanker models trained with more complex rule-based preference signals and user-specific interaction patterns (user1–user5).

The goal is to evaluate how different user profiles affect ranking behavior and recommendation diversity, even when overall NDCG scores are lower than the baseline.


Model Category 1: Scratch-trained Baseline

Purpose

Provide a simple cold-start recommendation baseline that matches ingredients and ranks recipes without personalization. It uses parent–child ingredient overlap and a few numeric features (e.g., protein, cost, cooking time).

Data Sources

  • Cleaned Food.com dataset (~180k recipes)
  • 10,000 synthetic preference samples generated via uniform random selection

Training Details

  • Model type: XGBRanker (objective='rank:pairwise')
  • Features: ~1000 numeric ingredient-parent ratio features + basic nutrition/time features
  • Train/test split: 80/20 (by recipe ID)
  • Evaluation metric: NDCG@5, NDCG@10

Evaluation

The baseline achieves very high NDCG scores (95%+), because training and evaluation rely on synthetic signals that align perfectly with the ranking structure.

Intended Use

Serve as a sanity check and upper bound for ranking performance, not for deployment.

Limitations

  • Unrealistically clean preference structure
  • No user differentiation
  • Inflated metrics due to synthetic evaluation

Model Category 2: Rule-enhanced Cold Start Models (User1–User5)

Purpose

Capture user-specific dietary preferences and ranking heuristics using richer rule sets, leading to more diverse recommendation patterns across different users.

Data Sources

  • Cleaned Food.com dataset (~180k recipes)
  • 5,000 cold-start synthetic interactions per user profile
  • Additional unselected (negative) samples included to simulate realistic cold-start scenarios

Model

  • Model type: XGBRanker (scratch-trained)
  • Training objective: rank:pairwise
  • Feature space:
    • Ingredient-parent coverage ratios (~1000 parent nodes)
    • Nutrition features: protein, calories, cost, cooking time
    • User preference weights: protein/time/cost
    • Dietary tag filters and exclusion rules

Training Setup

  • Train/valid/test split: 70/15/15 by recipe ID per profile
  • No fine-tuning between profiles; each profile trained independently
  • Evaluation metric: NDCG@5 and NDCG@10

Evaluation Results

User Profile NDCG@5 NDCG@10
user1 0.4400 0.4400
user2 0.4342 0.4342
user3 0.4179 0.4179
user4 0.1651 0.1651
user5 0.4607 0.4607

Note: User4 has very restrictive dietary preferences, resulting in very few matching recipes and inherently lower achievable NDCG.

:contentReference[oaicite:0]{index=0}:contentReference[oaicite:1]{index=1}:contentReference[oaicite:2]{index=2}:contentReference[oaicite:3]{index=3}:contentReference[oaicite:4]{index=4}

Although these NDCG values are lower than the baseline, this is expected for several reasons:

  • The cold-start datasets contain a large proportion of unselected recipes, leading to sparse positive signals.
  • More complex preference rules increase variability and reduce alignment with NDCG’s single-label relevance assumptions.
  • The models now produce more differentiated ranking behaviors across user profiles, which aligns with the intended personalization goals.

Model Selection Justification

  • XGBRanker was chosen for all models due to its effectiveness on structured tabular data, fast training time, and compatibility with large feature spaces (1000+ ingredients).
  • The baseline model acts as a clean control, providing an upper bound on achievable NDCG under idealized preferences.
  • The rule-enhanced models trade some raw NDCG performance for greater personalization fidelity, which is critical in multi-user recommendation contexts.

Evaluation Methodology

  • Metric: NDCG@5 and NDCG@10 on held-out cold-start samples
  • Each user model evaluated independently
  • Negative samples retained to approximate real-world recommendation class imbalance

Intended Uses and Limitations

Intended Uses

  • Multi-profile recipe recommendation
  • Studying personalization behaviors under sparse feedback
  • Cold-start scenarios for new users

Limitations

  • Synthetic user interactions do not perfectly reflect real-world feedback
  • NDCG is not well aligned with multi-rule personalization behavior
  • User4 performance is limited by scarcity of relevant recipes

Risks and Bias

The models are trained on the Food.com dataset, which has known biases:

  • Regional bias: Western and American cuisines dominate the dataset, leading to potential under-representation of other regions.
  • Popularity bias: Highly rated or frequently interacted recipes are over-represented.
  • Cold-start leakage risk: Although user interactions are synthetic, overlapping ingredient-parent structures between train/test may create mild information leakage, potentially inflating baseline metrics.

These biases may affect recommendation diversity and fairness across different cuisines or dietary groups.


Cost and Latency

All models are based on XGBRanker, which runs efficiently on CPU:

  • Inference latency: Approximately 1–5 ms per recipe for ranking (measured on a laptop CPU, single thread).
  • Training cost: Training each user profile model on 5,000 interactions takes less than 2 minutes on CPU.

The approach is designed for real-time personalization in lightweight interfaces (e.g., Hugging Face Spaces).


Usage Disclosure

Intended Uses

  • Academic and educational research on personalized recommendation
  • Cold-start personalization experiments
  • Recipe recommendation for diverse dietary profiles

Not Intended For

  • Medical or dietary decision-making
  • Real-world deployment without additional bias mitigation
  • High-stakes personalization where fairness across demographic groups is critical

Citation

Tang, Xinxuan. Personalized Recipe Ranking Models. 2025.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results