title: GEO-Bench Leaderboard
emoji: 🏆
colorFrom: purple
colorTo: green
sdk: docker
pinned: false
🏆 GEO-Bench Leaderboard
The GEO-Bench leaderboard tracks performance of geospatial foundation models on various benchmark datasets using the GEO-Bench benchmarking framework.
1. How to Submit New Results
1.1. Create New Submission Directory
Create a new folder in the new_submission
top directory:
geobench_leaderboard/
└── new_submission/
├── results_and_parameters.csv
├── additional_info.json
1.2. Add Results and Parameters Details
Add a CSV file (results_and_parameters.csv
) with the columns below. Please note that if terratorch-iterate is used for experiments, this table may be created automatically upon completion of an experiment. Please see the examples/results_and_parameters.csv
for an example.
backbone
: backbone used for experiment, (e.g. Prithvi-EO-V2 600M)dataset
: some or all of the GEO-bench datasets. Please see Info page to learn more.Metric
: the type of metric used for evaluation. Depending on the dataset, this may be one of the following:Overall_Accuracy
,Multilabel_F1_Score
,Multiclass_Jaccard_Index
experiment_name
: if terratorch-iterate used, this will the experiment_name used in mlflow. Otherwise, a unique name may be used for all results relating to a single backbonepartition name
: denotes the amount of data used. May be one of the folowing:1.00x train
for 100%,0.10x train
for 10%,0.50x train
for 50%,0.20x train
for 20%,0.01x train
for 1%batch_size_selection
: denotes whether the batch size was fixed during hyperparameter optimization. May befixed
oroptimized
early_stop_patience
: early stopping patience using for trainern_trials
: number of trials used for hyperparameter optimizationSeed
: random seed used for repeated experiment. 10 random seeds must be used for eachbatch_size
: batch size used for repeated experiments for each backbone/dataset combination.weight_decay
: weight decay experiments for each backbone/dataset combination.lr
: learning rate used for repeated experiments for each backbone/dataset combination. Obtained from hyperparameter optimization (HPO)test metric
: metric obtained from running backbone on the dataset during repeated experiment. Please see Info page to learn more.
1.3. Add Additional Information
Create a JSON file (additional_info.json
) with information about your submission and any new models that will be included.
The JSON file MUST have the same file name and contain the same keys as the examples/additional_info.json
file.
1.4. Submit PR
- Fork the repository
- Add your results following the structure above and in the PR comments add more details about your submission
- Create a pull request to main
2. Benchmarking with Terratorch-Iterate
The TerraTorch-Iterate library, based on TerraTorch, leverages MLFlow for experiment logging, optuna for hyperparameter optimization and ray for parallelization. It includes functionality to easily perform both hyperparameter tuning and re-repeated experiments in the manner prescribed by the GEO-Bench protocol.
2.1 Installation
Please see TerraTorch-Iterate for installation instructions
2.2 Running benchmark experiments
On existing models: To run experiments on an existing model, a custom config file specifying the model and dataset parameters should be prepared. To compare performance of multiple models, define a config file with unique experiment name for each model being comapred. Please see the examples
folder for sample config files. Each config file (experiment) can then be executed with the following command:
terratorch iterate --hpo --repeat --config <config-file>
On new models: New models can be evaluated by first onboarding them to the TerraTorch library. Once onboarded, benchmarking may be conducted as outlined above.
2.3 Summarizing and plotting results
Extract results and parameters: To extract results and hyperparameters, please run the script below. The resulting results_and_parameters.csv
file can be submitted to the GEO-Bench Leaderboard as described above:
from benchmark.utils import get_results_and_parameters, extract_parameters, get_logger
logger = get_logger()
storage_uri = "results/hpo_exp_results" #storage_uri from config
list_of_experiments = ["early_stopping_10_prithvi_600", "early_stopping_10_prithvi_600_tl", "early_stopping_10_dofa_vit_300"]
#get results and parameters from mlflow logs
results_and_parameters = get_results_and_parameters(
storage_uri = storage_uri,
logger = logger,
experiments = list_of_experiments,
task_names = SEGMENTATION_BASE_TASKS + CLASSIFICATION_BASE_TASKS,
num_repetitions = REPEATED_SEEDS_DEFAULT
)