try it out

#20

by dmaran - opened 27 days ago

base: refs/heads/main

←

from: refs/pr/20

Discussion Files changed

+49

-62

Files changed (3) hide show

README.md +6 -12
config.json +43 -47
model.safetensors +0 -3

README.md CHANGED Viewed

@@ -1,25 +1,19 @@
 ---
 license: apache-2.0
-pipeline_tag: text-to-speech
 language:
 - en
 tags:
-- model_hub_mixin
-- pytorch_model_hub_mixin
-widget:
-- text: "[S1] Dia is an open weights text to dialogue model. [S2] You get full control over scripts and voices. [S1] Wow. Amazing. (laughs) [S2] Try it now on Git hub or Hugging Face."
-  example_title: "Dia intro"
-- text: "[S1] Oh fire! Oh my goodness! What's the procedure? What to we do people? The smoke could be coming through an air duct! [S2] Oh my god! Okay.. it's happening. Everybody stay calm! [S1] What's the procedure... [S2] Everybody stay fucking calm!!!... Everybody fucking calm down!!!!! [S1] No! No! If you touch the handle, if its hot there might be a fire down the hallway!"
-  example_title: "Panic protocol"
 ---
 <center>
 <a href="https://github.com/nari-labs/dia">
 <img src="https://github.com/nari-labs/dia/raw/main/dia/static/images/banner.png">
 </a>
 </center>
-Dia is a 1.6B parameter text to speech model created by Nari Labs. It was pushed to the Hub using the [PytorchModelHubMixin](https://huggingface.co/docs/huggingface_hub/package_reference/mixins#huggingface_hub.PyTorchModelHubMixin) integration.
 Dia **directly generates highly realistic dialogue from a transcript**. You can condition the output on audio, enabling emotion and tone control. The model can also produce nonverbal communications like laughter, coughing, clearing throat, etc.
@@ -28,7 +22,7 @@ To accelerate research, we are providing access to pretrained model checkpoints
 We also provide a [demo page](https://yummy-fir-7a4.notion.site/dia) comparing our model to [ElevenLabs Studio](https://elevenlabs.io/studio) and [Sesame CSM-1B](https://github.com/SesameAILabs/csm).
 - (Update) We have a ZeroGPU Space running! Try it now [here](https://huggingface.co/spaces/nari-labs/Dia-1.6B). Thanks to the HF team for the support :)
-- Join our [discord server](https://discord.gg/bJq6vjRRKv) for community support and access to new features.
 - Play with a larger version of Dia: generate fun conversations, remix content, and share with friends. 🔮 Join the [waitlist](https://tally.so/r/meokbo) for early access.
 ## ⚡️ Quickstart
@@ -120,7 +114,7 @@ By using this model, you agree to uphold relevant legal standards and ethical re
 ## 🤝 Contributing
 We are a tiny team of 1 full-time and 1 part-time research-engineers. We are extra-welcome to any contributions!
-Join our [Discord Server](https://discord.gg/bJq6vjRRKv) for discussions.
 ## 🤗 Acknowledgements

 ---
 license: apache-2.0
 language:
 - en
 tags:
+- Text-to-Speech
+pipeline_tag: text-to-speech
+library_name: dia-tts
 ---
 <center>
 <a href="https://github.com/nari-labs/dia">
 <img src="https://github.com/nari-labs/dia/raw/main/dia/static/images/banner.png">
 </a>
 </center>
+Dia is a 1.6B parameter text to speech model created by Nari Labs.
 Dia **directly generates highly realistic dialogue from a transcript**. You can condition the output on audio, enabling emotion and tone control. The model can also produce nonverbal communications like laughter, coughing, clearing throat, etc.
 We also provide a [demo page](https://yummy-fir-7a4.notion.site/dia) comparing our model to [ElevenLabs Studio](https://elevenlabs.io/studio) and [Sesame CSM-1B](https://github.com/SesameAILabs/csm).
 - (Update) We have a ZeroGPU Space running! Try it now [here](https://huggingface.co/spaces/nari-labs/Dia-1.6B). Thanks to the HF team for the support :)
+- Join our [discord server](https://discord.gg/pgdB5YRe) for community support and access to new features.
 - Play with a larger version of Dia: generate fun conversations, remix content, and share with friends. 🔮 Join the [waitlist](https://tally.so/r/meokbo) for early access.
 ## ⚡️ Quickstart
 ## 🤝 Contributing
 We are a tiny team of 1 full-time and 1 part-time research-engineers. We are extra-welcome to any contributions!
+Join our [Discord Server](https://discord.gg/pgdB5YRe) for discussions.
 ## 🤗 Acknowledgements

config.json CHANGED Viewed

@@ -1,50 +1,46 @@
 {
-  "data": {
-    "audio_bos_value": 1026,
-    "audio_eos_value": 1024,
-    "audio_length": 3072,
-    "audio_pad_value": 1025,
-    "channels": 9,
-    "delay_pattern": [
-      0,
-      8,
-      9,
-      10,
-      11,
-      12,
-      13,
-      14,
-      15
-    ],
-    "text_length": 1024,
-    "text_pad_value": 0
-  },
-  "model": {
-    "decoder": {
-      "cross_head_dim": 128,
-      "cross_query_heads": 16,
-      "gqa_head_dim": 128,
-      "gqa_query_heads": 16,
-      "kv_heads": 4,
-      "n_embd": 2048,
-      "n_hidden": 8192,
-      "n_layer": 18
     },
-    "dropout": 0.0,
-    "encoder": {
-      "head_dim": 128,
-      "n_embd": 1024,
-      "n_head": 16,
-      "n_hidden": 4096,
-      "n_layer": 12
-    },
-    "normalization_layer_epsilon": 1e-05,
-    "rope_max_timescale": 10000,
-    "rope_min_timescale": 1,
-    "src_vocab_size": 256,
-    "tgt_vocab_size": 1028,
-    "weight_dtype": "float32"
-  },
-  "training": {},
-  "version": "0.1"
 }

 {
+    "version": "0.1",
+    "model": {
+        "encoder": {
+            "n_layer": 12,
+            "n_embd": 1024,
+            "n_hidden": 4096,
+            "n_head": 16,
+            "head_dim": 128
+        },
+        "decoder": {
+            "n_layer": 18,
+            "n_embd": 2048,
+            "n_hidden": 8192,
+            "gqa_query_heads": 16,
+            "cross_query_heads": 16,
+            "kv_heads": 4,
+            "gqa_head_dim": 128,
+            "cross_head_dim": 128
+        },
+        "src_vocab_size": 256,
+        "tgt_vocab_size": 1028,
+        "dropout": 0.0
     },
+    "training": {},
+    "data": {
+        "text_length": 1024,
+        "audio_length": 3072,
+        "channels": 9,
+        "text_pad_value": 0,
+        "audio_eos_value": 1024,
+        "audio_pad_value": 1025,
+        "audio_bos_value": 1026,
+        "delay_pattern": [
+            0,
+            8,
+            9,
+            10,
+            11,
+            12,
+            13,
+            14,
+            15
+        ]
+    }
 }

model.safetensors DELETED Viewed

@@ -1,3 +0,0 @@
-version https://git-lfs.github.com/spec/v1
-oid sha256:caba289b60f6d7d1e58fc744f4dc25aae88995fcca46be3d05e220b971486a26
-size 6444682848