Model Card for CLIP-based Aesthetic Predictor

A simple MLP intended to run on CLIP embeddings to predict the "aesthetic quality" of an image (how much people like it on average).

Trained by Christoph Schuhmann and adapted to suit the Vision Data Curation project.

For more information see: https://github.com/christophschuhmann/improved-aesthetic-predictor

Model Details

Model Type: Aesthetic score regression model
Input: OpenAI CLIP embeddings (vit_l14_pn_quick_gelu_openai-clip)
Output: A score between 0 and 10, where higher values correspond to more aesthetic images

Original authorship: Adapted from Christoph Schuhmann's MLP Aesthetic Score Predictor

Model Usage

This classifier operates on CLIP image embeddings rather than raw pixels. To run inference with the Birder framework:

# Download the CLIP backbone
python -m birder.tools download-model vit_l14_pn_quick_gelu_openai-clip

# Run prediction on a dataset
python -m birder.scripts.predict \
    -n vit_l14_pn_quick_gelu \
    -t openai-clip \
    --simple-crop \
    --gpu \
    --parallel \
    --batch-size 256 \
    --chunk-size 50000 \
    --amp \
    --amp-dtype bfloat16 \
    --save-logits \
    --suffix optional-dataset-name \
    path/to/dataset

# Can now run the aesthetic predictor on the saved logits

Intended Use

Primary use case: Ranking or filtering images by aesthetic appeal, dataset curation, and training data selection.

Recommended scope: Research, dataset preparation, and large-scale data analysis.

Not intended for: As a measure of artistic merit, cultural value, or taste preferences of specific individuals.

Citation

@misc{christophschuhmann2022improved-aesthetic-predictor,
  author = {Christoph Schuhmann},
  title = {MLP Aesthetic Score Predictor},
  year = {2022},
  url = {https://github.com/christophschuhmann/improved-aesthetic-predictor},
  note = {Accessed: August 22, 2025},
}

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support