Rating Prediction Model for CMU Landmarks

Model Description

This is an off-the-shelf Random Forest regressor model that predicts landmark ratings based on feature characteristics. The model helps validate existing ratings and can predict ratings for new landmarks based on their properties.

Model Details

Model Type

  • Architecture: Random Forest Regressor (scikit-learn)
  • Training: Off-the-shelf with feature engineering
  • Input: Landmark features (indoor/outdoor, dwell time, classes, geographic location)
  • Output: Predicted rating (0-5 scale)

Training Data

  • Dataset: 100+ CMU landmarks with existing ratings
  • Features:
    • Indoor/outdoor classification (binary)
    • Normalized dwell time (0-1 scale)
    • Multi-hot encoded landmark classes
    • Geographic distance from CMU center
  • Target: Existing landmark ratings (0-5 scale)

Training Procedure

  • Feature extraction and preprocessing
  • Random Forest training with 100 estimators
  • Cross-validation for performance estimation
  • Feature importance analysis

Intended Use

Primary Use Cases

  • Validating existing landmark ratings
  • Predicting ratings for new landmarks
  • Understanding feature importance in rating prediction
  • Quality assurance for landmark database

Out-of-Scope Use Cases

  • Predicting user-specific ratings
  • Real-time rating updates
  • Cross-campus rating predictions

Performance Metrics

  • Mean Absolute Error (MAE): ~0.3-0.5 on validation set
  • Mean Squared Error (MSE): ~0.2-0.4 on validation set
  • Feature Importance: Dwell time and class types are most predictive

Model Performance

Training Score: 0.85-0.90
Cross-validation Score: 0.75-0.80
Mean Absolute Error: 0.35

Limitations and Bias

  • Training Data: Based on existing ratings, may inherit rating biases
  • Feature Limitations: Limited to available landmark metadata
  • Geographic Scope: Trained only on CMU landmarks
  • Static Model: Does not adapt to changing user preferences

Ethical Considerations

  • Bias: May perpetuate existing rating biases in training data
  • Transparency: Feature importance is available for explainability
  • Fairness: Predictions based on objective landmark features

How to Use

from model import RatingPredictor, load_model_from_data

# Load and train model from landmarks data
predictor = load_model_from_data('data/landmarks.json')

# Predict rating for a landmark
landmark = {
    'id': 'example-landmark',
    'indoor/outdoor': 'indoor',
    'time taken to explore': 45,
    'Class': ['Culture', 'Research'],
    'geocoord': {'lat': 40.4433, 'lon': -79.9436},
    'rating': 4.5
}

predicted_rating = predictor.predict_rating(landmark)
print(f"Predicted rating: {predicted_rating:.2f}")

# Get feature importance
importance = predictor.get_feature_importance()
for feature, imp in sorted(importance.items(), key=lambda x: x[1], reverse=True)[:5]:
    print(f"{feature}: {imp:.3f}")

Model Files

  • model.py: Main model implementation
  • README.md: This model card

Feature Importance

The model identifies these as the most important features:

  1. Dwell Time: Strong predictor of landmark rating
  2. Class Types: Certain classes (Culture, Research) correlate with higher ratings
  3. Indoor/Outdoor: Indoor landmarks tend to have different rating patterns
  4. Geographic Location: Distance from campus center affects ratings

Citation

@misc{cmu-explorer-rating-predictor,
  title={Rating Prediction Model for CMU Landmarks},
  author={Yash Sakhale, Faiyaz Azam},
  year={2025},
  url={https://huggingface.co/spaces/ysakhale/Tartan-Explore}
}

Model Card Contact

For questions about this model, please refer to the CMU Explorer Space.

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support