Rating Prediction Model for CMU Landmarks

Model Description

This is an off-the-shelf Random Forest regressor model that predicts landmark ratings based on feature characteristics. The model helps validate existing ratings and can predict ratings for new landmarks based on their properties.

Model Details

Model Type

Architecture: Random Forest Regressor (scikit-learn)
Training: Off-the-shelf with feature engineering
Input: Landmark features (indoor/outdoor, dwell time, classes, geographic location)
Output: Predicted rating (0-5 scale)

Training Data

Dataset: 100+ CMU landmarks with existing ratings
Features:
- Indoor/outdoor classification (binary)
- Normalized dwell time (0-1 scale)
- Multi-hot encoded landmark classes
- Geographic distance from CMU center
Target: Existing landmark ratings (0-5 scale)

Training Procedure

Feature extraction and preprocessing
Random Forest training with 100 estimators
Cross-validation for performance estimation
Feature importance analysis

Intended Use

Primary Use Cases

Validating existing landmark ratings
Predicting ratings for new landmarks
Understanding feature importance in rating prediction
Quality assurance for landmark database

Out-of-Scope Use Cases

Predicting user-specific ratings
Real-time rating updates
Cross-campus rating predictions

Performance Metrics

Mean Absolute Error (MAE): ~0.3-0.5 on validation set
Mean Squared Error (MSE): ~0.2-0.4 on validation set
Feature Importance: Dwell time and class types are most predictive

Model Performance

Training Score: 0.85-0.90
Cross-validation Score: 0.75-0.80
Mean Absolute Error: 0.35

Limitations and Bias

Training Data: Based on existing ratings, may inherit rating biases
Feature Limitations: Limited to available landmark metadata
Geographic Scope: Trained only on CMU landmarks
Static Model: Does not adapt to changing user preferences

Ethical Considerations

Bias: May perpetuate existing rating biases in training data
Transparency: Feature importance is available for explainability
Fairness: Predictions based on objective landmark features

How to Use

from model import RatingPredictor, load_model_from_data

# Load and train model from landmarks data
predictor = load_model_from_data('data/landmarks.json')

# Predict rating for a landmark
landmark = {
    'id': 'example-landmark',
    'indoor/outdoor': 'indoor',
    'time taken to explore': 45,
    'Class': ['Culture', 'Research'],
    'geocoord': {'lat': 40.4433, 'lon': -79.9436},
    'rating': 4.5
}

predicted_rating = predictor.predict_rating(landmark)
print(f"Predicted rating: {predicted_rating:.2f}")

# Get feature importance
importance = predictor.get_feature_importance()
for feature, imp in sorted(importance.items(), key=lambda x: x[1], reverse=True)[:5]:
    print(f"{feature}: {imp:.3f}")

Model Files

model.py: Main model implementation
README.md: This model card

Feature Importance

The model identifies these as the most important features:

Dwell Time: Strong predictor of landmark rating
Class Types: Certain classes (Culture, Research) correlate with higher ratings
Indoor/Outdoor: Indoor landmarks tend to have different rating patterns
Geographic Location: Distance from campus center affects ratings

Citation

@misc{cmu-explorer-rating-predictor,
  title={Rating Prediction Model for CMU Landmarks},
  author={Yash Sakhale, Faiyaz Azam},
  year={2025},
  url={https://huggingface.co/spaces/ysakhale/Tartan-Explore}
}

Model Card Contact

For questions about this model, please refer to the CMU Explorer Space.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support