ibm-esa-geospatial
/

TerraMind-1.0-base

TerraTorch

Earth Observation

TerraMind

IBM

ESA

Model card Files Files and versions Community

blumenstiel commited on 15 days ago

Commit

5f5ec4c

1 Parent(s): e59eb64

Update ReadMe

Browse files

Files changed (2) hide show

.gitattributes +1 -0
README.md +130 -13

.gitattributes CHANGED Viewed

@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+*.png filter=lfs diff=lfs merge=lfs -text

README.md CHANGED Viewed

@@ -1,35 +1,152 @@
 ---
 library_name: terratorch
 ---
 # TerraMind 1.0 base
-Model weights of the V1.0 base version of TerraMind.
-Paper: https://ibm.box.com/s/yee2v9lgb4et9w7phehj3h4elgqxqto7 (Do not distribute!)
-## Pre-training
-The model was pre-trained for 500B tokens on the TerraMesh dataset with 9M aligned multi-modal samples (S2L1C, S2L2A, S1GRD/S1RTC, DEM, NDVI, LULC, RGB, Coords). The pre-training targets are tokens from pre-trained FSQ-VAEs.
 ## Usage
-Download and set up the `TerraMind-Finetuning` repository (see https://huggingface.co/FAST-EO/TerraMind_Finetuning).
-Checkpoints are available via this private repository. You need to set your `HF_TOKEN` via an env variable. The model is then automatically downloaded by TerraTorch.
 ```shell
-export HF_TOKEN=<your-token>
 ```
-You can finetune TerraMind via TerraTorch by just calling `terramind` instead of `terratorch` (this registers the TerraMind models to the backbone registry before calling the TerraTorch CLI).
 ```shell
-terramind fit -c config.yaml
 ```
-Alternatively, you can register the model with `import terramind` and build the model via TerraTorch in your python code.
 ```python
-import terramind
 from terratorch import BACKBONE_REGISTRY
-model = BACKBONE_REGISTRY.build('terramind_v01_base', pretrained=True, modalities=['S2L2A', 'S1GRD', 'DEM', 'RGB'])
 ```
-You find a detailed usage description at https://huggingface.co/FAST-EO/TerraMind_Finetuning.

 ---
+license: apache-2.0
 library_name: terratorch
+tags:
+  - Earth Observation
+  - Foundation Model
+  - IBM
+  - ESA
 ---
 # TerraMind 1.0 base
+TerraMind is the first multimodal any-to-any generative foundation model for Earth Observation jointly developed by IBM, ESA, and Forschungszentrum Jülich.
+![terramind_architecture.png](assets%2Fterramind_architecture.png)
+## Architecture
+TerraMind uses a dual-scale transformer-based encoder-decoder architecture, simultaneously processing pixel-level and token-level data.
+The model was pre-trained on 500B tokens from 9M spatiotemporally aligned multimodal samples from the TerraMesh dataset.
+Modality-specific patch embeddings allow direct processing of raw inputs, while modality-specific FSQ-VAEs are used for image tokenization.
+For sequence-like modalities such as coordinates, an adapted WordPiece tokenizer is employed.
+During pre-training, TerraMind leverages masked token reconstruction, learning complex cross-modal correlations to generate high-quality latent representations.
+## Evaluation
+![terramind_generations.png](assets%2Fterramind_generations.png)
+We benchmarked TerraMind against other geospatial foundation models using the PANGAEA benchmark.
+TerraMind consistently achieved state-of-the-art performance, surpassing existing models in various downstream tasks such as land use segmentation, water body mapping, and vegetation assessments.
+The evaluation highlights its effectiveness in handling diverse Earth Observation scenarios.
+We present additional experiments in our [pre-print](https://arxiv.org/abs/2504.11171).
 ## Usage
+TerraMind is fully integrated into the fine-tuning package [TerraTorch](https://ibm.github.io/terratorch/).
+This makes it easy to initialize the pre-trained model or fine-tune it via PyTorch Lightning.
+The weights are automatically downloaded from Hugging Face.
+### Fine-tuning
+You can fine-tune TerraMind with a config using TerraTorch:
 ```shell
+terratorch fit -c terramind_config.yaml
 ```
+For testing the fine-tuned TerraMind model, run:
 ```shell
+terratorch test -c terramind_config.yaml --ckpt_path path/to/your/checkpoint.ckpt
+```
+We provide config examples and notebooks with step-by-step explanations at https://github.com/IBM/terramind.
+### Backbone
+Alternatively, you can build the backbone with the following code and use it in your custom pipeline.
+```python
+from terratorch import BACKBONE_REGISTRY
+model = BACKBONE_REGISTRY.build(
+    'terramind_v1_base',
+    pretrained=True,
+    modalities=['S2L2A', 'S1GRD']
+)
+```
+The model supports the following raw inputs which you can specify in modalities: S2L2A, S2L1C, S1GRD, S1RTC, DEM, RGB.
+If your data does not use all bands of a modality, you can specify a subset with `bands={'S2L2A': ['BLUE', 'GREEN', 'RED', 'NIR_NARROW', 'SWIR_1', 'SWIR_2']}`.
+You can pass the inputs as in a dict to the model. If a tensor is directly passed, the model assumes it is the first defined modality.
+TerraMind can also handle missing input modalities.
+```python
+output = model(
+  {
+    'S2L2A': s2l2a_tensor,  # B, 12, 224, 224
+    'S1GRD': s1grd_tensor,  # B, 2, 224, 224
+  }
+)
+output.shape  # B, 196, 768
 ```
+The model outputs patch embeddings for each input modality. By default, the patch embeddings are averaged over all modalities to reduce the output size.
+You can specify another `merge_method` from `'mean'`, `'max'`, `'concat'`, `'dict'`, and `None`.
+- `mean` and `max` are applied per patch over all image modality embeddings.
+- `concat` stacks all image modalities along the embedding dimension and returns one embedding per patch.
+- `dict` returns all tokens split by modality in a dictionary.
+- `None` returns the tokens without further processing.
+### Thinking in Modalities
+TerraMind introduces a new Thinking-in-Modalities (TiM) approach, where other modalities are predicted as an intermediate steps.
+Then, the fine-tuned encoder uses both raw inputs and the generated modalities.
+Use TiM models in TerraTorch by adding `_tim` to the model name:
 ```python
 from terratorch import BACKBONE_REGISTRY
+model = BACKBONE_REGISTRY.build(
+    'terramind_v1_base_tim',
+    pretrained=True,
+    modalities=['S2L2A', 'S1GRD'],
+    tim_modalities=['LULC']  # optional, defaults to LULC (land-use land-cover)
+)
+```
+If you use TiM models, we recommend using the [pre-training statistics](https://github.com/IBM/terratorch/blob/a4ca8df7c7f22ddf469f372e1099157d2d7beeb2/terratorch/models/backbones/terramind/model/terramind_register.py#L111) for standardization.
+### Generations
+TerraMind can perform any-to-any generation based on varying combinations of inputs.
+![terramind_generations.png](assets%2Fterramind_generations.png)
+Build the full TerraMind model (including de-tokenizer steps) from the `FULL_MODEL_REGISTRY`:
+```python
+from terratorch import FULL_MODEL_REGISTRY
+model = FULL_MODEL_REGISTRY.build(
+    'terramind_v1_base_generate',
+    pretrained=False,
+    modalities=['S2L2A'],
+    output_modalities=['S1GRD', 'LULC'],
+    timesteps=10,  # Define diffusion steps
+    standardize=True,  # Automatically applies the standardization values to the input and output
+)
 ```
+Like the backbone, pass multiple modalities as a dict or a single modality as a tensor to the model which returns generated images as a dict of tensors.
+Note: These generations are not reconstructions but "mental images" representing how the model imagines the modality.
+You can control generation details via the number of diffusion steps (`timesteps`) that you can pass to the constructor or the forward function.
+We provide an example notebook for generations at https://github.com/IBM/terramind.
+## Feedback
+Your feedback is invaluable to us.
+Please share it with us by starting a discussion in this HF repository or submitting an issue to [TerraMind](https://github.com/IBM/terramind) on GitHub.
+## Citation
+If you use TerraMind in your research, please cite the [TerraMind](https://arxiv.org/abs/2504.11171) pre-print.
+```text
+@article{jakubik2025terramind,
+  title={TerraMind: Large-Scale Generative Multimodality for Earth Observation},
+  author={Jakubik, Johannes and Yang, Felix and Blumenstiel, Benedikt and Scheurer, Erik and Sedona, Rocco and Maurogiovanni, Stefano and Bosmans, Jente and Dionelis, Nikolaos and Marsocci, Valerio and Kopp, Niklas and others},
+  journal={arXiv preprint arXiv:2504.11171},
+  year={2025}
+}
+```