Spaces:

aialliance
/

GEO-Bench-Leaderboard

Running

App Files Files Community

GEO-Bench-Leaderboard / README.md

NaomiS

update terratorch command

6f0ddde 6 months ago

preview code

raw

history blame contribute delete

5.54 kB

metadata

title: GEO-Bench Leaderboard
emoji: 🏆
colorFrom: purple
colorTo: green
sdk: docker
pinned: false

🏆 GEO-Bench Leaderboard

The GEO-Bench leaderboard tracks performance of geospatial foundation models on various benchmark datasets using the GEO-Bench benchmarking framework.

1. How to Submit New Results

1.1. Create New Submission Directory

Create a new folder in the new_submission top directory:

geobench_leaderboard/
└── new_submission/
    ├── results_and_parameters.csv
    ├── additional_info.json

1.2. Add Results and Parameters Details

Add a CSV file (results_and_parameters.csv) with the columns below. Please note that if terratorch-iterate is used for experiments, this table may be created automatically upon completion of an experiment. Please see the examples/results_and_parameters.csv for an example.

backbone: backbone used for experiment, (e.g. Prithvi-EO-V2 600M)
dataset: some or all of the GEO-bench datasets. Please see Info page to learn more.
Metric: the type of metric used for evaluation. Depending on the dataset, this may be one of the following: Overall_Accuracy, Multilabel_F1_Score, Multiclass_Jaccard_Index
experiment_name: if terratorch-iterate used, this will the experiment_name used in mlflow. Otherwise, a unique name may be used for all results relating to a single backbone
partition name: denotes the amount of data used. May be one of the folowing: 1.00x train for 100%, 0.10x train for 10%, 0.50x train for 50%, 0.20x train for 20%, 0.01x train for 1%
batch_size_selection: denotes whether the batch size was fixed during hyperparameter optimization. May be fixed or optimized
early_stop_patience: early stopping patience using for trainer
n_trials: number of trials used for hyperparameter optimization
Seed: random seed used for repeated experiment. 10 random seeds must be used for each
batch_size: batch size used for repeated experiments for each backbone/dataset combination.
weight_decay: weight decay experiments for each backbone/dataset combination.
lr: learning rate used for repeated experiments for each backbone/dataset combination. Obtained from hyperparameter optimization (HPO)
test metric: metric obtained from running backbone on the dataset during repeated experiment. Please see Info page to learn more.

1.3. Add Additional Information

Create a JSON file (additional_info.json) with information about your submission and any new models that will be included. The JSON file MUST have the same file name and contain the same keys as the examples/additional_info.json file.

1.4. Submit PR

Fork the repository
Add your results following the structure above and in the PR comments add more details about your submission
Create a pull request to main

2. Benchmarking with Terratorch-Iterate

The TerraTorch-Iterate library, based on TerraTorch, leverages MLFlow for experiment logging, optuna for hyperparameter optimization and ray for parallelization. It includes functionality to easily perform both hyperparameter tuning and re-repeated experiments in the manner prescribed by the GEO-Bench protocol.

2.1 Installation

Please see TerraTorch-Iterate for installation instructions

2.2 Running benchmark experiments

On existing models: To run experiments on an existing model, a custom config file specifying the model and dataset parameters should be prepared. To compare performance of multiple models, define a config file with unique experiment name for each model being comapred. Please see the examples folder for sample config files. Each config file (experiment) can then be executed with the following command:

terratorch iterate --hpo --repeat --config <config-file>

On new models: New models can be evaluated by first onboarding them to the TerraTorch library. Once onboarded, benchmarking may be conducted as outlined above.

2.3 Summarizing and plotting results

Extract results and parameters: To extract results and hyperparameters, please run the script below. The resulting results_and_parameters.csv file can be submitted to the GEO-Bench Leaderboard as described above:

from benchmark.utils import get_results_and_parameters, extract_parameters, get_logger
logger = get_logger()
storage_uri = "results/hpo_exp_results" #storage_uri from config
list_of_experiments = ["early_stopping_10_prithvi_600", "early_stopping_10_prithvi_600_tl", "early_stopping_10_dofa_vit_300"] 

#get results and parameters from mlflow logs
results_and_parameters = get_results_and_parameters(
                                storage_uri = storage_uri,
                                logger = logger,
                                experiments = list_of_experiments,
                                task_names = SEGMENTATION_BASE_TASKS + CLASSIFICATION_BASE_TASKS,
                                num_repetitions = REPEATED_SEEDS_DEFAULT
                                )