--- language: en license: mit tags: - recommendation - ranking - personalization - xgboost - xgbranker - recipe - cold-start datasets: - your-username/recipe-cleaned-dataset model-index: - name: Personalized Recipe Ranking Models results: - task: type: recommendation name: Personalized Recipe Ranking dataset: name: Food.com (Cleaned) type: your-username/recipe-cleaned-dataset metrics: - type: ndcg@5 value: 0.44 - type: ndcg@10 value: 0.44 --- # Model Card: Personalized Recipe Ranking Models ## Overview This project implements a personalized recipe recommendation system using two model categories: 1. **Scratch-trained baseline**: A simple rule-based + embedding matching ranker trained on a synthetic preference dataset (no user-specific rules). 2. **Rule-enhanced cold-start models**: Five separate XGBRanker models trained with more complex rule-based preference signals and user-specific interaction patterns (user1–user5). The goal is to evaluate how different user profiles affect ranking behavior and recommendation diversity, even when overall NDCG scores are lower than the baseline. --- ## Model Category 1: Scratch-trained Baseline ### Purpose Provide a simple cold-start recommendation baseline that matches ingredients and ranks recipes without personalization. It uses parent–child ingredient overlap and a few numeric features (e.g., protein, cost, cooking time). ### Data Sources - Cleaned Food.com dataset (~180k recipes) - 10,000 synthetic preference samples generated via uniform random selection ### Training Details - Model type: **XGBRanker** (`objective='rank:pairwise'`) - Features: ~1000 numeric ingredient-parent ratio features + basic nutrition/time features - Train/test split: 80/20 (by recipe ID) - Evaluation metric: NDCG@5, NDCG@10 ### Evaluation The baseline achieves **very high NDCG scores (95%+)**, because training and evaluation rely on synthetic signals that align perfectly with the ranking structure. ### Intended Use Serve as a **sanity check** and upper bound for ranking performance, not for deployment. ### Limitations - Unrealistically clean preference structure - No user differentiation - Inflated metrics due to synthetic evaluation --- ## Model Category 2: Rule-enhanced Cold Start Models (User1–User5) ### Purpose Capture user-specific dietary preferences and ranking heuristics using richer rule sets, leading to more diverse recommendation patterns across different users. ### Data Sources - Cleaned Food.com dataset (~180k recipes) - 5,000 cold-start synthetic interactions per user profile - Additional unselected (negative) samples included to simulate realistic cold-start scenarios ### Model - Model type: **XGBRanker** (scratch-trained) - Training objective: `rank:pairwise` - Feature space: - Ingredient-parent coverage ratios (~1000 parent nodes) - Nutrition features: protein, calories, cost, cooking time - User preference weights: protein/time/cost - Dietary tag filters and exclusion rules ### Training Setup - Train/valid/test split: 70/15/15 by recipe ID per profile - No fine-tuning between profiles; each profile trained independently - Evaluation metric: NDCG@5 and NDCG@10 ### Evaluation Results | User Profile | NDCG@5 | NDCG@10 | |-------------|--------|---------| | user1 | 0.4400 | 0.4400 | | user2 | 0.4342 | 0.4342 | | user3 | 0.4179 | 0.4179 | | user4 | 0.1651 | 0.1651 | | user5 | 0.4607 | 0.4607 | **Note:** User4 has very restrictive dietary preferences, resulting in very few matching recipes and inherently lower achievable NDCG. :contentReference[oaicite:0]{index=0}:contentReference[oaicite:1]{index=1}:contentReference[oaicite:2]{index=2}:contentReference[oaicite:3]{index=3}:contentReference[oaicite:4]{index=4} Although these NDCG values are lower than the baseline, this is expected for several reasons: - The cold-start datasets contain a large proportion of unselected recipes, leading to sparse positive signals. - More complex preference rules increase variability and reduce alignment with NDCG’s single-label relevance assumptions. - The models now produce more differentiated ranking behaviors across user profiles, which aligns with the intended personalization goals. --- ## Model Selection Justification - **XGBRanker** was chosen for all models due to its effectiveness on structured tabular data, fast training time, and compatibility with large feature spaces (1000+ ingredients). - The **baseline model** acts as a clean control, providing an upper bound on achievable NDCG under idealized preferences. - The **rule-enhanced models** trade some raw NDCG performance for greater personalization fidelity, which is critical in multi-user recommendation contexts. --- ## Evaluation Methodology - Metric: NDCG@5 and NDCG@10 on held-out cold-start samples - Each user model evaluated independently - Negative samples retained to approximate real-world recommendation class imbalance --- ## Intended Uses and Limitations **Intended Uses** - Multi-profile recipe recommendation - Studying personalization behaviors under sparse feedback - Cold-start scenarios for new users **Limitations** - Synthetic user interactions do not perfectly reflect real-world feedback - NDCG is not well aligned with multi-rule personalization behavior - User4 performance is limited by scarcity of relevant recipes --- ## Risks and Bias The models are trained on the Food.com dataset, which has known biases: - **Regional bias**: Western and American cuisines dominate the dataset, leading to potential under-representation of other regions. - **Popularity bias**: Highly rated or frequently interacted recipes are over-represented. - **Cold-start leakage risk**: Although user interactions are synthetic, overlapping ingredient-parent structures between train/test may create mild information leakage, potentially inflating baseline metrics. These biases may affect recommendation diversity and fairness across different cuisines or dietary groups. --- ## Cost and Latency All models are based on **XGBRanker**, which runs efficiently on CPU: - **Inference latency**: Approximately 1–5 ms per recipe for ranking (measured on a laptop CPU, single thread). - **Training cost**: Training each user profile model on 5,000 interactions takes less than 2 minutes on CPU. The approach is designed for real-time personalization in lightweight interfaces (e.g., Hugging Face Spaces). --- ## Usage Disclosure **Intended Uses** - Academic and educational research on personalized recommendation - Cold-start personalization experiments - Recipe recommendation for diverse dietary profiles **Not Intended For** - Medical or dietary decision-making - Real-world deployment without additional bias mitigation - High-stakes personalization where fairness across demographic groups is critical --- ## Citation Tang, Xinxuan. Personalized Recipe Ranking Models. 2025.