InstaDeepAI
/

visnet-organics

Model card Files Files and versions Community

heloise-chomet commited on May 7

Commit

d71d2c3

verified ·

1 Parent(s): c145591

Update README.md

Browse files

Added ViSNet hyperparameter description

Files changed (1) hide show

README.md +67 -0

README.md CHANGED Viewed

@@ -1,3 +1,70 @@
 ## License summary
 1. The Licensed Models are **only** available under this License for Non-Commercial Purposes.

+# ViSNet
+## Reference
+Yusong Wang, Tong Wang, Shaoning Li, Xinheng He, Mingyu Li, Zun Wang, Nanning Zheng, Bin Shao, and Tie-Yan Liu.
+Enhancing geometric representations for molecules with equivariant vector-scalar interactive message passing.
+Nature Communications, 15(1), January 2024. ISSN: 2041-1723.
+URL: https://dx.doi.org/10.1038/s41467-023-43720-2.
+## Hyperparameters, model configurations and training strategies
+### Model architecture
+| Parameter          | Value    | Description                                                              |
+|--------------------|----------|--------------------------------------------------------------------------|
+| `num_layers`       | `4`      | Number of ViSNet layers.                                                 |
+| `num_channels`     | `128`    | Number of channels.                                                      |
+| `l_max`            | `2`      | Highest harmonic order included in the Spherical Harmonics series.       |
+| `num_heads`        | `8`      | Number of heads in the attention block.                                  |
+| `num_rbf`          | `32`     | Number of radial basis functions in the embedding block.                 |
+| `trainable_rbf`    | `False`  | Whether to add learnable weights to the radial embedding basis functions.|
+| `activation`       | `silu`   | Activation function for the output block.                                |
+| `attn_activation`  | `silu`   | Activation function for the attention block.                             |
+| `vecnorm_type`     | `None`   | Type of the vector norm.                                                 |
+| `atomic_energies`  | `average`| Treatment of the atomic energies.                                        |
+| `avg_um_neighbors` | `None`   | Mean number of neighbors.                                                |
+### Training
+| Parameter                | Value  | Description                                      |
+|--------------------------|--------|--------------------------------------------------|
+| `num_epochs`             | `220`  | Number of epochs to run.                         |
+| `ema_decay`              | `0.99` | The EMA decay rate.                              |
+| `eval_num_graphs`        | `None` | Number of validation set graphs to evaluate on.  |
+| `use_ema_params_for_eval`| `True` | Whether to use the EMA parameters for evaluation.|
+### Optimizer
+| Parameter                        | Value          | Description                                                     |
+|----------------------------------|----------------|-----------------------------------------------------------------|
+| `init_learning_rate`             | `0.0001`       | Initial learning rate.                                          |
+| `peak_learning_rate`             | `0.0001`       | Peak learning rate.                                             |
+| `final_learning_rate`            | `0.0001`       | Final learning rate.                                            |
+| `weight_decay`                   | `0`            | Weight decay.                                                   |
+| `warmup_steps`                   | `4000`         | Number of optimizer warm-up steps.                              |
+| `transition_steps`               | `360000`       | Number of optimizer transition steps.                           |
+| `grad_norm`                      | `500`          | Gradient norm used for gradient clipping.                       |
+| `num_gradient_accumulation_steps`| `1`            | Steps to accumulate before taking an optimizer step.            |
+| `algorithm`                      | `optax.amsgrad`| The AMSGrad optimizer.                                          |
+| `b1`                             | `0.9`          | Exponential decay rate to track first moment of past gradients. |
+| `b2`                             | `0.999`        | Exponential decay rate to track second moment of past gradients.|
+| `eps`                            | `1e-8`         | Constant applied to denominator outside the square root.        |
+| `eps_root`                       | `0.0`          | Constant applied to denominator inside the square root.         |
+### Huber Loss Energy weight schedule
+| Parameter             | Value                              | Description                                                                                     |
+|-----------------------|------------------------------------|-------------------------------------------------------------------------------------------------|
+| `schedule`            | `optax.piecewise_constant_schedule`| Piecewise constant schedule with scaled jumps at specific boundaries.                           |
+| `init_value`          | `40`                               | Initial value.                                                                                  |
+| `boundaries_and_scale`| `{115: 25}`                        | Dictionary of {step: scale} where scale is multiplied into the schedule value at the given step.|
+### Huber Loss Force weight schedule
+| Parameter             | Value                              | Description                                                                                     |
+|-----------------------|------------------------------------|-------------------------------------------------------------------------------------------------|
+| `schedule`            | `optax.piecewise_constant_schedule`| Piecewise constant schedule with scaled jumps at specific boundaries.                           |
+| `init_value`          | `1000`                             | Initial value.                                                                                  |
+| `boundaries_and_scale`| `{115: 0.04}`                      | Dictionary of {step: scale} where scale is multiplied into the schedule value at the given step.|
+### Dataset
+| Parameter                   | Value | Description                                |
+|-----------------------------|-------|--------------------------------------------|
+| `graph_cutoff_angstrom`     | `5`   | Graph cutoff distance (in Å).              |
+| `max_n_node`                | `32`  | Maximum number of nodes allowed in a batch.|
+| `max_n_edge`                | `288` | Maximum number of edges allowed in a batch.|
+| `batch_size`                | `16`  | Number of graphs in a batch.               |
+This model was trained on the [SPICE2_curated dataset](https://huggingface.co/datasets/InstaDeepAI/SPICE2-curated).
+## How to Use
+For complete usage instructions and more information, please refer to our [documentation](https://instadeep.github.io/mlip)
 ## License summary
 1. The Licensed Models are **only** available under this License for Non-Commercial Purposes.