Upload trained model folder

Files changed (4) hide show

README.md CHANGED Viewed

@@ -1,3 +1,17 @@
----
-license: apache-2.0
----

+---
+library_name: moeob
+license: mit
+pipeline_tag: text-generation
+tags:
+- byte-level
+- experimental
+- mixture-of-experts
+- model_hub_mixin
+- pytorch_model_hub_mixin
+- summary-then-generate
+---
+This model has been pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration:
+- Code: https://github.com/enosislabs/moeob
+- Paper: [More Information Needed]
+- Docs: [More Information Needed]

config.json ADDED Viewed

+{
+  "vocab_size": 256,
+  "hidden_dim": 64,
+  "patch_encoder_dim": 32,
+  "n_layers": 3,
+  "n_heads": 4,
+  "n_experts": 4,
+  "top_k_experts": 2,
+  "expert_hidden_mult": 2,
+  "patch_encoder_layers": 1,
+  "max_seq_length": 512,
+  "max_patches": 64,
+  "learning_rate": 0.001,
+  "weight_decay": 0.01,
+  "beta1": 0.9,
+  "beta2": 0.98,
+  "epsilon": 1e-08,
+  "batch_size": 256,
+  "micro_batch_size": 4,
+  "gradient_clip": 0.5,
+  "load_balance_coefficient": 0.01,
+  "dropout": 0.0,
+  "use_prenorm": true,
+  "entropy_threshold": 0.4,
+  "min_patch_size": 4,
+  "max_patch_size": 32
+}

pytorch_model.bin ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:e675277d54fabd402bbdda525961c5371a7e2a928fce4baec3a377be98ecc917
+size 1254791

training_metrics.json ADDED Viewed

The diff for this file is too large to render. See raw diff