Rating Prediction Model for CMU Landmarks
Model Description
This is an off-the-shelf Random Forest regressor model that predicts landmark ratings based on feature characteristics. The model helps validate existing ratings and can predict ratings for new landmarks based on their properties.
Model Details
Model Type
- Architecture: Random Forest Regressor (scikit-learn)
- Training: Off-the-shelf with feature engineering
- Input: Landmark features (indoor/outdoor, dwell time, classes, geographic location)
- Output: Predicted rating (0-5 scale)
Training Data
- Dataset: 100+ CMU landmarks with existing ratings
- Features:
- Indoor/outdoor classification (binary)
- Normalized dwell time (0-1 scale)
- Multi-hot encoded landmark classes
- Geographic distance from CMU center
- Target: Existing landmark ratings (0-5 scale)
Training Procedure
- Feature extraction and preprocessing
- Random Forest training with 100 estimators
- Cross-validation for performance estimation
- Feature importance analysis
Intended Use
Primary Use Cases
- Validating existing landmark ratings
- Predicting ratings for new landmarks
- Understanding feature importance in rating prediction
- Quality assurance for landmark database
Out-of-Scope Use Cases
- Predicting user-specific ratings
- Real-time rating updates
- Cross-campus rating predictions
Performance Metrics
- Mean Absolute Error (MAE): ~0.3-0.5 on validation set
- Mean Squared Error (MSE): ~0.2-0.4 on validation set
- Feature Importance: Dwell time and class types are most predictive
Model Performance
Training Score: 0.85-0.90
Cross-validation Score: 0.75-0.80
Mean Absolute Error: 0.35
Limitations and Bias
- Training Data: Based on existing ratings, may inherit rating biases
- Feature Limitations: Limited to available landmark metadata
- Geographic Scope: Trained only on CMU landmarks
- Static Model: Does not adapt to changing user preferences
Ethical Considerations
- Bias: May perpetuate existing rating biases in training data
- Transparency: Feature importance is available for explainability
- Fairness: Predictions based on objective landmark features
How to Use
from model import RatingPredictor, load_model_from_data
# Load and train model from landmarks data
predictor = load_model_from_data('data/landmarks.json')
# Predict rating for a landmark
landmark = {
'id': 'example-landmark',
'indoor/outdoor': 'indoor',
'time taken to explore': 45,
'Class': ['Culture', 'Research'],
'geocoord': {'lat': 40.4433, 'lon': -79.9436},
'rating': 4.5
}
predicted_rating = predictor.predict_rating(landmark)
print(f"Predicted rating: {predicted_rating:.2f}")
# Get feature importance
importance = predictor.get_feature_importance()
for feature, imp in sorted(importance.items(), key=lambda x: x[1], reverse=True)[:5]:
print(f"{feature}: {imp:.3f}")
Model Files
model.py: Main model implementationREADME.md: This model card
Feature Importance
The model identifies these as the most important features:
- Dwell Time: Strong predictor of landmark rating
- Class Types: Certain classes (Culture, Research) correlate with higher ratings
- Indoor/Outdoor: Indoor landmarks tend to have different rating patterns
- Geographic Location: Distance from campus center affects ratings
Citation
@misc{cmu-explorer-rating-predictor,
title={Rating Prediction Model for CMU Landmarks},
author={Yash Sakhale, Faiyaz Azam},
year={2025},
url={https://huggingface.co/spaces/ysakhale/Tartan-Explore}
}
Model Card Contact
For questions about this model, please refer to the CMU Explorer Space.
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support