Run with CUDA

by mlfresh - opened May 26

May 26

Is there a way to run with CUDA? It automatically selected cpu when I launch with ggc s2 and I couldn't find a way. Thanks.

calcuis

Owner May 27

ggc s2 has auto detection logic; you could edit the inference to force the device point to cuda, for example: here

    model = Dia.from_pretrained("callgg/dia-f16", compute_dtype="float16", device=device)

line38, set the device="cuda"

mlfresh

May 27

Thanks, I realise there is some issue with my torch thats the reason why it didn't use cuda automaticly

jeanguifarro

23 days ago

How to run a specific quantisized model?

calcuis

Owner 23 days ago

should rebuild the inference or might need to build a new engine but no manpower to work on this project/task recently

jeanguifarro

23 days ago

So the current engine, can just inference the fp16? you quantisized them but no way to run inference on them right?

calcuis

Owner 23 days ago

the two new models work faster; try them first; this one will handle later

calcuis

Owner 23 days ago

did you try this; not sure it works or not

jeanguifarro

23 days ago

the two new models work faster; try them first; this one will handle later

I use yours with the following code:

from dia.model import Dia

model = Dia.from_pretrained("callgg/dia-f16", compute_dtype="float16")

text = "[S1] Lens is a deep-tech AI company redefining how large language models think, reason, and interact with the world. Today’s portfolio performance aligns with my core belief: exceptional businesses with enduring growth narratives outperform over time. The robust gains in Alphabet (GOOG, GOOGL) reflect its dominance in digital advertising and accelerating momentum in AI-driven cloud services. "
# path to your prompt audioA
prompt_path = "mark.mp3"

# generate, supplying the prompt path
output = model.generate(
    text,
    audio_prompt = prompt_path,
    verbose=True, 
)

model.save_audio("lens/test1.mp3", output)

And yes the inference was a bit faster. But I'm interested in using the quantisized versions you uploded. So basically I need to build a new engine right in order to use them, cause I want to use Dia for streaming, I have tested the mlx quantisized and is much faster, but not fast enough for streaming? Also I tried the mmwillet but is slower the 4 bit one. Here is the output if you're interested:

$ ./build/bin/Release/tts-cli.exe \
  --model-path Dia_Q4.gguf \
  --prompt "Today's portfolio performance aligns with my core belief: exceptional businesses with enduring growth narratives outperform over time. The robust gains in Alphabet (GOOG, GOOGL) reflect its dominance in digital advertising and accelerating momentum in AI-driven cloud services." \
  --save-path ./test.wav
Writing audio file: ./test.wav
|======================================|
Num Channels: 1
Num Samples Per Channel: 513536
Sample Rate: 44100
Bit Depth: 16
Length in Seconds: 11.6448
|======================================|
total time = 124307.80 ms

Thank you for your help.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment