Commit
·
5f5ec4c
1
Parent(s):
e59eb64
Update ReadMe
Browse files- .gitattributes +1 -0
- README.md +130 -13
.gitattributes
CHANGED
@@ -33,3 +33,4 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
|
|
|
33 |
*.zip filter=lfs diff=lfs merge=lfs -text
|
34 |
*.zst filter=lfs diff=lfs merge=lfs -text
|
35 |
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
36 |
+
*.png filter=lfs diff=lfs merge=lfs -text
|
README.md
CHANGED
@@ -1,35 +1,152 @@
|
|
1 |
---
|
|
|
2 |
library_name: terratorch
|
|
|
|
|
|
|
|
|
|
|
3 |
---
|
4 |
# TerraMind 1.0 base
|
5 |
-
Model weights of the V1.0 base version of TerraMind.
|
6 |
|
7 |
-
|
8 |
|
|
|
9 |
|
10 |
-
##
|
11 |
|
12 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
13 |
|
14 |
|
15 |
## Usage
|
16 |
|
17 |
-
|
18 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
```shell
|
20 |
-
|
21 |
```
|
22 |
|
23 |
-
|
24 |
```shell
|
25 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
26 |
```
|
27 |
|
28 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
29 |
```python
|
30 |
-
import terramind
|
31 |
from terratorch import BACKBONE_REGISTRY
|
32 |
-
model = BACKBONE_REGISTRY.build(
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
33 |
```
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
34 |
|
35 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
---
|
2 |
+
license: apache-2.0
|
3 |
library_name: terratorch
|
4 |
+
tags:
|
5 |
+
- Earth Observation
|
6 |
+
- Foundation Model
|
7 |
+
- IBM
|
8 |
+
- ESA
|
9 |
---
|
10 |
# TerraMind 1.0 base
|
|
|
11 |
|
12 |
+
TerraMind is the first multimodal any-to-any generative foundation model for Earth Observation jointly developed by IBM, ESA, and Forschungszentrum Jülich.
|
13 |
|
14 |
+

|
15 |
|
16 |
+
## Architecture
|
17 |
|
18 |
+
TerraMind uses a dual-scale transformer-based encoder-decoder architecture, simultaneously processing pixel-level and token-level data.
|
19 |
+
The model was pre-trained on 500B tokens from 9M spatiotemporally aligned multimodal samples from the TerraMesh dataset.
|
20 |
+
|
21 |
+
Modality-specific patch embeddings allow direct processing of raw inputs, while modality-specific FSQ-VAEs are used for image tokenization.
|
22 |
+
For sequence-like modalities such as coordinates, an adapted WordPiece tokenizer is employed.
|
23 |
+
During pre-training, TerraMind leverages masked token reconstruction, learning complex cross-modal correlations to generate high-quality latent representations.
|
24 |
+
|
25 |
+
## Evaluation
|
26 |
+
|
27 |
+

|
28 |
+
|
29 |
+
We benchmarked TerraMind against other geospatial foundation models using the PANGAEA benchmark.
|
30 |
+
TerraMind consistently achieved state-of-the-art performance, surpassing existing models in various downstream tasks such as land use segmentation, water body mapping, and vegetation assessments.
|
31 |
+
The evaluation highlights its effectiveness in handling diverse Earth Observation scenarios.
|
32 |
+
We present additional experiments in our [pre-print](https://arxiv.org/abs/2504.11171).
|
33 |
|
34 |
|
35 |
## Usage
|
36 |
|
37 |
+
TerraMind is fully integrated into the fine-tuning package [TerraTorch](https://ibm.github.io/terratorch/).
|
38 |
+
This makes it easy to initialize the pre-trained model or fine-tune it via PyTorch Lightning.
|
39 |
+
The weights are automatically downloaded from Hugging Face.
|
40 |
+
|
41 |
+
### Fine-tuning
|
42 |
+
|
43 |
+
You can fine-tune TerraMind with a config using TerraTorch:
|
44 |
+
|
45 |
```shell
|
46 |
+
terratorch fit -c terramind_config.yaml
|
47 |
```
|
48 |
|
49 |
+
For testing the fine-tuned TerraMind model, run:
|
50 |
```shell
|
51 |
+
terratorch test -c terramind_config.yaml --ckpt_path path/to/your/checkpoint.ckpt
|
52 |
+
```
|
53 |
+
|
54 |
+
We provide config examples and notebooks with step-by-step explanations at https://github.com/IBM/terramind.
|
55 |
+
|
56 |
+
### Backbone
|
57 |
+
|
58 |
+
Alternatively, you can build the backbone with the following code and use it in your custom pipeline.
|
59 |
+
|
60 |
+
```python
|
61 |
+
from terratorch import BACKBONE_REGISTRY
|
62 |
+
model = BACKBONE_REGISTRY.build(
|
63 |
+
'terramind_v1_base',
|
64 |
+
pretrained=True,
|
65 |
+
modalities=['S2L2A', 'S1GRD']
|
66 |
+
)
|
67 |
+
```
|
68 |
+
|
69 |
+
The model supports the following raw inputs which you can specify in modalities: S2L2A, S2L1C, S1GRD, S1RTC, DEM, RGB.
|
70 |
+
If your data does not use all bands of a modality, you can specify a subset with `bands={'S2L2A': ['BLUE', 'GREEN', 'RED', 'NIR_NARROW', 'SWIR_1', 'SWIR_2']}`.
|
71 |
+
You can pass the inputs as in a dict to the model. If a tensor is directly passed, the model assumes it is the first defined modality.
|
72 |
+
TerraMind can also handle missing input modalities.
|
73 |
+
|
74 |
+
```python
|
75 |
+
output = model(
|
76 |
+
{
|
77 |
+
'S2L2A': s2l2a_tensor, # B, 12, 224, 224
|
78 |
+
'S1GRD': s1grd_tensor, # B, 2, 224, 224
|
79 |
+
}
|
80 |
+
)
|
81 |
+
|
82 |
+
output.shape # B, 196, 768
|
83 |
```
|
84 |
|
85 |
+
The model outputs patch embeddings for each input modality. By default, the patch embeddings are averaged over all modalities to reduce the output size.
|
86 |
+
You can specify another `merge_method` from `'mean'`, `'max'`, `'concat'`, `'dict'`, and `None`.
|
87 |
+
- `mean` and `max` are applied per patch over all image modality embeddings.
|
88 |
+
- `concat` stacks all image modalities along the embedding dimension and returns one embedding per patch.
|
89 |
+
- `dict` returns all tokens split by modality in a dictionary.
|
90 |
+
- `None` returns the tokens without further processing.
|
91 |
+
|
92 |
+
### Thinking in Modalities
|
93 |
+
|
94 |
+
TerraMind introduces a new Thinking-in-Modalities (TiM) approach, where other modalities are predicted as an intermediate steps.
|
95 |
+
Then, the fine-tuned encoder uses both raw inputs and the generated modalities.
|
96 |
+
|
97 |
+
Use TiM models in TerraTorch by adding `_tim` to the model name:
|
98 |
```python
|
|
|
99 |
from terratorch import BACKBONE_REGISTRY
|
100 |
+
model = BACKBONE_REGISTRY.build(
|
101 |
+
'terramind_v1_base_tim',
|
102 |
+
pretrained=True,
|
103 |
+
modalities=['S2L2A', 'S1GRD'],
|
104 |
+
tim_modalities=['LULC'] # optional, defaults to LULC (land-use land-cover)
|
105 |
+
)
|
106 |
+
```
|
107 |
+
|
108 |
+
If you use TiM models, we recommend using the [pre-training statistics](https://github.com/IBM/terratorch/blob/a4ca8df7c7f22ddf469f372e1099157d2d7beeb2/terratorch/models/backbones/terramind/model/terramind_register.py#L111) for standardization.
|
109 |
+
|
110 |
+
### Generations
|
111 |
+
|
112 |
+
TerraMind can perform any-to-any generation based on varying combinations of inputs.
|
113 |
+
|
114 |
+

|
115 |
+
|
116 |
+
Build the full TerraMind model (including de-tokenizer steps) from the `FULL_MODEL_REGISTRY`:
|
117 |
+
|
118 |
+
```python
|
119 |
+
from terratorch import FULL_MODEL_REGISTRY
|
120 |
+
|
121 |
+
model = FULL_MODEL_REGISTRY.build(
|
122 |
+
'terramind_v1_base_generate',
|
123 |
+
pretrained=False,
|
124 |
+
modalities=['S2L2A'],
|
125 |
+
output_modalities=['S1GRD', 'LULC'],
|
126 |
+
timesteps=10, # Define diffusion steps
|
127 |
+
standardize=True, # Automatically applies the standardization values to the input and output
|
128 |
+
)
|
129 |
```
|
130 |
+
Like the backbone, pass multiple modalities as a dict or a single modality as a tensor to the model which returns generated images as a dict of tensors.
|
131 |
+
Note: These generations are not reconstructions but "mental images" representing how the model imagines the modality.
|
132 |
+
You can control generation details via the number of diffusion steps (`timesteps`) that you can pass to the constructor or the forward function.
|
133 |
+
|
134 |
+
We provide an example notebook for generations at https://github.com/IBM/terramind.
|
135 |
+
|
136 |
+
## Feedback
|
137 |
+
|
138 |
+
Your feedback is invaluable to us.
|
139 |
+
Please share it with us by starting a discussion in this HF repository or submitting an issue to [TerraMind](https://github.com/IBM/terramind) on GitHub.
|
140 |
+
|
141 |
+
## Citation
|
142 |
+
|
143 |
+
If you use TerraMind in your research, please cite the [TerraMind](https://arxiv.org/abs/2504.11171) pre-print.
|
144 |
|
145 |
+
```text
|
146 |
+
@article{jakubik2025terramind,
|
147 |
+
title={TerraMind: Large-Scale Generative Multimodality for Earth Observation},
|
148 |
+
author={Jakubik, Johannes and Yang, Felix and Blumenstiel, Benedikt and Scheurer, Erik and Sedona, Rocco and Maurogiovanni, Stefano and Bosmans, Jente and Dionelis, Nikolaos and Marsocci, Valerio and Kopp, Niklas and others},
|
149 |
+
journal={arXiv preprint arXiv:2504.11171},
|
150 |
+
year={2025}
|
151 |
+
}
|
152 |
+
```
|