File size: 7,619 Bytes

# ViSNet
## Reference
Yusong Wang, Tong Wang, Shaoning Li, Xinheng He, Mingyu Li, Zun Wang, Nanning Zheng, Bin Shao, and Tie-Yan Liu.
Enhancing geometric representations for molecules with equivariant vector-scalar interactive message passing.
Nature Communications, 15(1), January 2024. ISSN: 2041-1723. 
URL: https://dx.doi.org/10.1038/s41467-023-43720-2.
## Hyperparameters, model configurations and training strategies
### Model architecture
| Parameter          | Value    | Description                                                              |
|--------------------|----------|--------------------------------------------------------------------------|
| `num_layers`       | `4`      | Number of ViSNet layers.                                                 |
| `num_channels`     | `128`    | Number of channels.                                                      |
| `l_max`            | `2`      | Highest harmonic order included in the Spherical Harmonics series.       |
| `num_heads`        | `8`      | Number of heads in the attention block.                                  |
| `num_rbf`          | `32`     | Number of radial basis functions in the embedding block.                 |
| `trainable_rbf`    | `False`  | Whether to add learnable weights to the radial embedding basis functions.|
| `activation`       | `silu`   | Activation function for the output block.                                |
| `attn_activation`  | `silu`   | Activation function for the attention block.                             |
| `vecnorm_type`     | `None`   | Type of the vector norm.                                                 |
| `atomic_energies`  | `average`| Treatment of the atomic energies.                                        |
| `avg_um_neighbors` | `None`   | Mean number of neighbors.                                                |
### Training
| Parameter                | Value  | Description                                      |
|--------------------------|--------|--------------------------------------------------|
| `num_epochs`             | `220`  | Number of epochs to run.                         |
| `ema_decay`              | `0.99` | The EMA decay rate.                              |
| `eval_num_graphs`        | `None` | Number of validation set graphs to evaluate on.  |
| `use_ema_params_for_eval`| `True` | Whether to use the EMA parameters for evaluation.|
### Optimizer
| Parameter                        | Value          | Description                                                     |
|----------------------------------|----------------|-----------------------------------------------------------------|
| `init_learning_rate`             | `0.0001`       | Initial learning rate.                                          |
| `peak_learning_rate`             | `0.0001`       | Peak learning rate.                                             |
| `final_learning_rate`            | `0.0001`       | Final learning rate.                                            |
| `weight_decay`                   | `0`            | Weight decay.                                                   |
| `warmup_steps`                   | `4000`         | Number of optimizer warm-up steps.                              |
| `transition_steps`               | `360000`       | Number of optimizer transition steps.                           |
| `grad_norm`                      | `500`          | Gradient norm used for gradient clipping.                       |
| `num_gradient_accumulation_steps`| `1`            | Steps to accumulate before taking an optimizer step.            |
| `algorithm`                      | `optax.amsgrad`| The AMSGrad optimizer.                                          |
| `b1`                             | `0.9`          | Exponential decay rate to track first moment of past gradients. |
| `b2`                             | `0.999`        | Exponential decay rate to track second moment of past gradients.|
| `eps`                            | `1e-8`         | Constant applied to denominator outside the square root.        |
| `eps_root`                       | `0.0`          | Constant applied to denominator inside the square root.         |
### Huber Loss Energy weight schedule
| Parameter             | Value                              | Description                                                                                     |
|-----------------------|------------------------------------|-------------------------------------------------------------------------------------------------|
| `schedule`            | `optax.piecewise_constant_schedule`| Piecewise constant schedule with scaled jumps at specific boundaries.                           |
| `init_value`          | `40`                               | Initial value.                                                                                  |
| `boundaries_and_scale`| `{115: 25}`                        | Dictionary of {step: scale} where scale is multiplied into the schedule value at the given step.|
### Huber Loss Force weight schedule
| Parameter             | Value                              | Description                                                                                     |
|-----------------------|------------------------------------|-------------------------------------------------------------------------------------------------|
| `schedule`            | `optax.piecewise_constant_schedule`| Piecewise constant schedule with scaled jumps at specific boundaries.                           |
| `init_value`          | `1000`                             | Initial value.                                                                                  |
| `boundaries_and_scale`| `{115: 0.04}`                      | Dictionary of {step: scale} where scale is multiplied into the schedule value at the given step.|
### Dataset
| Parameter                   | Value | Description                                |
|-----------------------------|-------|--------------------------------------------|
| `graph_cutoff_angstrom`     | `5`   | Graph cutoff distance (in Å).              |
| `max_n_node`                | `32`  | Maximum number of nodes allowed in a batch.|
| `max_n_edge`                | `288` | Maximum number of edges allowed in a batch.|
| `batch_size`                | `16`  | Number of graphs in a batch.               |
This model was trained on the [SPICE2_curated dataset](https://huggingface.co/datasets/InstaDeepAI/SPICE2-curated).
## How to Use
For complete usage instructions and more information, please refer to our [documentation](https://instadeep.github.io/mlip)

## License summary

1. The Licensed Models are **only** available under this License for Non-Commercial Purposes.
2. You are permitted to reproduce, publish, share and adapt the Output generated by the Licensed Model only for Non-Commercial Purposes and in accordance with this License.
3. You may **not** use the Licensed Models or any of its Outputs in connection with:
    1. any Commercial Purposes, unless agreed by Us under a separate licence;
    2. to train, improve or otherwise influence the functionality or performance of any other third-party derivative model that is commercial or intended for a Commercial Purpose and is similar to the Licensed Models;
    3. to create models distilled or derived from the Outputs of the Licensed Models, unless such models are for Non-Commercial Purposes and open-sourced under the same license as the Licensed Models; or
    4. in violation of any applicable laws and regulations.