Update README.md
Browse filesAdded ViSNet hyperparameter description
README.md
CHANGED
@@ -1,3 +1,70 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
## License summary
|
2 |
|
3 |
1. The Licensed Models are **only** available under this License for Non-Commercial Purposes.
|
|
|
1 |
+
# ViSNet
|
2 |
+
## Reference
|
3 |
+
Yusong Wang, Tong Wang, Shaoning Li, Xinheng He, Mingyu Li, Zun Wang, Nanning Zheng, Bin Shao, and Tie-Yan Liu.
|
4 |
+
Enhancing geometric representations for molecules with equivariant vector-scalar interactive message passing.
|
5 |
+
Nature Communications, 15(1), January 2024. ISSN: 2041-1723.
|
6 |
+
URL: https://dx.doi.org/10.1038/s41467-023-43720-2.
|
7 |
+
## Hyperparameters, model configurations and training strategies
|
8 |
+
### Model architecture
|
9 |
+
| Parameter | Value | Description |
|
10 |
+
|--------------------|----------|--------------------------------------------------------------------------|
|
11 |
+
| `num_layers` | `4` | Number of ViSNet layers. |
|
12 |
+
| `num_channels` | `128` | Number of channels. |
|
13 |
+
| `l_max` | `2` | Highest harmonic order included in the Spherical Harmonics series. |
|
14 |
+
| `num_heads` | `8` | Number of heads in the attention block. |
|
15 |
+
| `num_rbf` | `32` | Number of radial basis functions in the embedding block. |
|
16 |
+
| `trainable_rbf` | `False` | Whether to add learnable weights to the radial embedding basis functions.|
|
17 |
+
| `activation` | `silu` | Activation function for the output block. |
|
18 |
+
| `attn_activation` | `silu` | Activation function for the attention block. |
|
19 |
+
| `vecnorm_type` | `None` | Type of the vector norm. |
|
20 |
+
| `atomic_energies` | `average`| Treatment of the atomic energies. |
|
21 |
+
| `avg_um_neighbors` | `None` | Mean number of neighbors. |
|
22 |
+
### Training
|
23 |
+
| Parameter | Value | Description |
|
24 |
+
|--------------------------|--------|--------------------------------------------------|
|
25 |
+
| `num_epochs` | `220` | Number of epochs to run. |
|
26 |
+
| `ema_decay` | `0.99` | The EMA decay rate. |
|
27 |
+
| `eval_num_graphs` | `None` | Number of validation set graphs to evaluate on. |
|
28 |
+
| `use_ema_params_for_eval`| `True` | Whether to use the EMA parameters for evaluation.|
|
29 |
+
### Optimizer
|
30 |
+
| Parameter | Value | Description |
|
31 |
+
|----------------------------------|----------------|-----------------------------------------------------------------|
|
32 |
+
| `init_learning_rate` | `0.0001` | Initial learning rate. |
|
33 |
+
| `peak_learning_rate` | `0.0001` | Peak learning rate. |
|
34 |
+
| `final_learning_rate` | `0.0001` | Final learning rate. |
|
35 |
+
| `weight_decay` | `0` | Weight decay. |
|
36 |
+
| `warmup_steps` | `4000` | Number of optimizer warm-up steps. |
|
37 |
+
| `transition_steps` | `360000` | Number of optimizer transition steps. |
|
38 |
+
| `grad_norm` | `500` | Gradient norm used for gradient clipping. |
|
39 |
+
| `num_gradient_accumulation_steps`| `1` | Steps to accumulate before taking an optimizer step. |
|
40 |
+
| `algorithm` | `optax.amsgrad`| The AMSGrad optimizer. |
|
41 |
+
| `b1` | `0.9` | Exponential decay rate to track first moment of past gradients. |
|
42 |
+
| `b2` | `0.999` | Exponential decay rate to track second moment of past gradients.|
|
43 |
+
| `eps` | `1e-8` | Constant applied to denominator outside the square root. |
|
44 |
+
| `eps_root` | `0.0` | Constant applied to denominator inside the square root. |
|
45 |
+
### Huber Loss Energy weight schedule
|
46 |
+
| Parameter | Value | Description |
|
47 |
+
|-----------------------|------------------------------------|-------------------------------------------------------------------------------------------------|
|
48 |
+
| `schedule` | `optax.piecewise_constant_schedule`| Piecewise constant schedule with scaled jumps at specific boundaries. |
|
49 |
+
| `init_value` | `40` | Initial value. |
|
50 |
+
| `boundaries_and_scale`| `{115: 25}` | Dictionary of {step: scale} where scale is multiplied into the schedule value at the given step.|
|
51 |
+
### Huber Loss Force weight schedule
|
52 |
+
| Parameter | Value | Description |
|
53 |
+
|-----------------------|------------------------------------|-------------------------------------------------------------------------------------------------|
|
54 |
+
| `schedule` | `optax.piecewise_constant_schedule`| Piecewise constant schedule with scaled jumps at specific boundaries. |
|
55 |
+
| `init_value` | `1000` | Initial value. |
|
56 |
+
| `boundaries_and_scale`| `{115: 0.04}` | Dictionary of {step: scale} where scale is multiplied into the schedule value at the given step.|
|
57 |
+
### Dataset
|
58 |
+
| Parameter | Value | Description |
|
59 |
+
|-----------------------------|-------|--------------------------------------------|
|
60 |
+
| `graph_cutoff_angstrom` | `5` | Graph cutoff distance (in Å). |
|
61 |
+
| `max_n_node` | `32` | Maximum number of nodes allowed in a batch.|
|
62 |
+
| `max_n_edge` | `288` | Maximum number of edges allowed in a batch.|
|
63 |
+
| `batch_size` | `16` | Number of graphs in a batch. |
|
64 |
+
This model was trained on the [SPICE2_curated dataset](https://huggingface.co/datasets/InstaDeepAI/SPICE2-curated).
|
65 |
+
## How to Use
|
66 |
+
For complete usage instructions and more information, please refer to our [documentation](https://instadeep.github.io/mlip)
|
67 |
+
|
68 |
## License summary
|
69 |
|
70 |
1. The Licensed Models are **only** available under this License for Non-Commercial Purposes.
|