TerraTorch
Earth Observation
TerraMind
IBM
ESA
blumenstiel commited on
Commit
5f5ec4c
·
1 Parent(s): e59eb64

Update ReadMe

Browse files
Files changed (2) hide show
  1. .gitattributes +1 -0
  2. README.md +130 -13
.gitattributes CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ *.png filter=lfs diff=lfs merge=lfs -text
README.md CHANGED
@@ -1,35 +1,152 @@
1
  ---
 
2
  library_name: terratorch
 
 
 
 
 
3
  ---
4
  # TerraMind 1.0 base
5
- Model weights of the V1.0 base version of TerraMind.
6
 
7
- Paper: https://ibm.box.com/s/yee2v9lgb4et9w7phehj3h4elgqxqto7 (Do not distribute!)
8
 
 
9
 
10
- ## Pre-training
11
 
12
- The model was pre-trained for 500B tokens on the TerraMesh dataset with 9M aligned multi-modal samples (S2L1C, S2L2A, S1GRD/S1RTC, DEM, NDVI, LULC, RGB, Coords). The pre-training targets are tokens from pre-trained FSQ-VAEs.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
13
 
14
 
15
  ## Usage
16
 
17
- Download and set up the `TerraMind-Finetuning` repository (see https://huggingface.co/FAST-EO/TerraMind_Finetuning).
18
- Checkpoints are available via this private repository. You need to set your `HF_TOKEN` via an env variable. The model is then automatically downloaded by TerraTorch.
 
 
 
 
 
 
19
  ```shell
20
- export HF_TOKEN=<your-token>
21
  ```
22
 
23
- You can finetune TerraMind via TerraTorch by just calling `terramind` instead of `terratorch` (this registers the TerraMind models to the backbone registry before calling the TerraTorch CLI).
24
  ```shell
25
- terramind fit -c config.yaml
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
  ```
27
 
28
- Alternatively, you can register the model with `import terramind` and build the model via TerraTorch in your python code.
 
 
 
 
 
 
 
 
 
 
 
 
29
  ```python
30
- import terramind
31
  from terratorch import BACKBONE_REGISTRY
32
- model = BACKBONE_REGISTRY.build('terramind_v01_base', pretrained=True, modalities=['S2L2A', 'S1GRD', 'DEM', 'RGB'])
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33
  ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
34
 
35
- You find a detailed usage description at https://huggingface.co/FAST-EO/TerraMind_Finetuning.
 
 
 
 
 
 
 
 
1
  ---
2
+ license: apache-2.0
3
  library_name: terratorch
4
+ tags:
5
+ - Earth Observation
6
+ - Foundation Model
7
+ - IBM
8
+ - ESA
9
  ---
10
  # TerraMind 1.0 base
 
11
 
12
+ TerraMind is the first multimodal any-to-any generative foundation model for Earth Observation jointly developed by IBM, ESA, and Forschungszentrum Jülich.
13
 
14
+ ![terramind_architecture.png](assets%2Fterramind_architecture.png)
15
 
16
+ ## Architecture
17
 
18
+ TerraMind uses a dual-scale transformer-based encoder-decoder architecture, simultaneously processing pixel-level and token-level data.
19
+ The model was pre-trained on 500B tokens from 9M spatiotemporally aligned multimodal samples from the TerraMesh dataset.
20
+
21
+ Modality-specific patch embeddings allow direct processing of raw inputs, while modality-specific FSQ-VAEs are used for image tokenization.
22
+ For sequence-like modalities such as coordinates, an adapted WordPiece tokenizer is employed.
23
+ During pre-training, TerraMind leverages masked token reconstruction, learning complex cross-modal correlations to generate high-quality latent representations.
24
+
25
+ ## Evaluation
26
+
27
+ ![terramind_generations.png](assets%2Fterramind_generations.png)
28
+
29
+ We benchmarked TerraMind against other geospatial foundation models using the PANGAEA benchmark.
30
+ TerraMind consistently achieved state-of-the-art performance, surpassing existing models in various downstream tasks such as land use segmentation, water body mapping, and vegetation assessments.
31
+ The evaluation highlights its effectiveness in handling diverse Earth Observation scenarios.
32
+ We present additional experiments in our [pre-print](https://arxiv.org/abs/2504.11171).
33
 
34
 
35
  ## Usage
36
 
37
+ TerraMind is fully integrated into the fine-tuning package [TerraTorch](https://ibm.github.io/terratorch/).
38
+ This makes it easy to initialize the pre-trained model or fine-tune it via PyTorch Lightning.
39
+ The weights are automatically downloaded from Hugging Face.
40
+
41
+ ### Fine-tuning
42
+
43
+ You can fine-tune TerraMind with a config using TerraTorch:
44
+
45
  ```shell
46
+ terratorch fit -c terramind_config.yaml
47
  ```
48
 
49
+ For testing the fine-tuned TerraMind model, run:
50
  ```shell
51
+ terratorch test -c terramind_config.yaml --ckpt_path path/to/your/checkpoint.ckpt
52
+ ```
53
+
54
+ We provide config examples and notebooks with step-by-step explanations at https://github.com/IBM/terramind.
55
+
56
+ ### Backbone
57
+
58
+ Alternatively, you can build the backbone with the following code and use it in your custom pipeline.
59
+
60
+ ```python
61
+ from terratorch import BACKBONE_REGISTRY
62
+ model = BACKBONE_REGISTRY.build(
63
+ 'terramind_v1_base',
64
+ pretrained=True,
65
+ modalities=['S2L2A', 'S1GRD']
66
+ )
67
+ ```
68
+
69
+ The model supports the following raw inputs which you can specify in modalities: S2L2A, S2L1C, S1GRD, S1RTC, DEM, RGB.
70
+ If your data does not use all bands of a modality, you can specify a subset with `bands={'S2L2A': ['BLUE', 'GREEN', 'RED', 'NIR_NARROW', 'SWIR_1', 'SWIR_2']}`.
71
+ You can pass the inputs as in a dict to the model. If a tensor is directly passed, the model assumes it is the first defined modality.
72
+ TerraMind can also handle missing input modalities.
73
+
74
+ ```python
75
+ output = model(
76
+ {
77
+ 'S2L2A': s2l2a_tensor, # B, 12, 224, 224
78
+ 'S1GRD': s1grd_tensor, # B, 2, 224, 224
79
+ }
80
+ )
81
+
82
+ output.shape # B, 196, 768
83
  ```
84
 
85
+ The model outputs patch embeddings for each input modality. By default, the patch embeddings are averaged over all modalities to reduce the output size.
86
+ You can specify another `merge_method` from `'mean'`, `'max'`, `'concat'`, `'dict'`, and `None`.
87
+ - `mean` and `max` are applied per patch over all image modality embeddings.
88
+ - `concat` stacks all image modalities along the embedding dimension and returns one embedding per patch.
89
+ - `dict` returns all tokens split by modality in a dictionary.
90
+ - `None` returns the tokens without further processing.
91
+
92
+ ### Thinking in Modalities
93
+
94
+ TerraMind introduces a new Thinking-in-Modalities (TiM) approach, where other modalities are predicted as an intermediate steps.
95
+ Then, the fine-tuned encoder uses both raw inputs and the generated modalities.
96
+
97
+ Use TiM models in TerraTorch by adding `_tim` to the model name:
98
  ```python
 
99
  from terratorch import BACKBONE_REGISTRY
100
+ model = BACKBONE_REGISTRY.build(
101
+ 'terramind_v1_base_tim',
102
+ pretrained=True,
103
+ modalities=['S2L2A', 'S1GRD'],
104
+ tim_modalities=['LULC'] # optional, defaults to LULC (land-use land-cover)
105
+ )
106
+ ```
107
+
108
+ If you use TiM models, we recommend using the [pre-training statistics](https://github.com/IBM/terratorch/blob/a4ca8df7c7f22ddf469f372e1099157d2d7beeb2/terratorch/models/backbones/terramind/model/terramind_register.py#L111) for standardization.
109
+
110
+ ### Generations
111
+
112
+ TerraMind can perform any-to-any generation based on varying combinations of inputs.
113
+
114
+ ![terramind_generations.png](assets%2Fterramind_generations.png)
115
+
116
+ Build the full TerraMind model (including de-tokenizer steps) from the `FULL_MODEL_REGISTRY`:
117
+
118
+ ```python
119
+ from terratorch import FULL_MODEL_REGISTRY
120
+
121
+ model = FULL_MODEL_REGISTRY.build(
122
+ 'terramind_v1_base_generate',
123
+ pretrained=False,
124
+ modalities=['S2L2A'],
125
+ output_modalities=['S1GRD', 'LULC'],
126
+ timesteps=10, # Define diffusion steps
127
+ standardize=True, # Automatically applies the standardization values to the input and output
128
+ )
129
  ```
130
+ Like the backbone, pass multiple modalities as a dict or a single modality as a tensor to the model which returns generated images as a dict of tensors.
131
+ Note: These generations are not reconstructions but "mental images" representing how the model imagines the modality.
132
+ You can control generation details via the number of diffusion steps (`timesteps`) that you can pass to the constructor or the forward function.
133
+
134
+ We provide an example notebook for generations at https://github.com/IBM/terramind.
135
+
136
+ ## Feedback
137
+
138
+ Your feedback is invaluable to us.
139
+ Please share it with us by starting a discussion in this HF repository or submitting an issue to [TerraMind](https://github.com/IBM/terramind) on GitHub.
140
+
141
+ ## Citation
142
+
143
+ If you use TerraMind in your research, please cite the [TerraMind](https://arxiv.org/abs/2504.11171) pre-print.
144
 
145
+ ```text
146
+ @article{jakubik2025terramind,
147
+ title={TerraMind: Large-Scale Generative Multimodality for Earth Observation},
148
+ author={Jakubik, Johannes and Yang, Felix and Blumenstiel, Benedikt and Scheurer, Erik and Sedona, Rocco and Maurogiovanni, Stefano and Bosmans, Jente and Dionelis, Nikolaos and Marsocci, Valerio and Kopp, Niklas and others},
149
+ journal={arXiv preprint arXiv:2504.11171},
150
+ year={2025}
151
+ }
152
+ ```