Run with CUDA

#1
by mlfresh - opened

Is there a way to run with CUDA? It automatically selected cpu when I launch with ggc s2 and I couldn't find a way. Thanks.

ggc s2 has auto detection logic; you could edit the inference to force the device point to cuda, for example: here

    model = Dia.from_pretrained("callgg/dia-f16", compute_dtype="float16", device=device)

line38, set the device="cuda"

Thanks, I realise there is some issue with my torch thats the reason why it didn't use cuda automaticly

How to run a specific quantisized model?

should rebuild the inference or might need to build a new engine but no manpower to work on this project/task recently

So the current engine, can just inference the fp16? you quantisized them but no way to run inference on them right?

the two new models work faster; try them first; this one will handle later

did you try this; not sure it works or not

the two new models work faster; try them first; this one will handle later

I use yours with the following code:

from dia.model import Dia

model = Dia.from_pretrained("callgg/dia-f16", compute_dtype="float16")

text = "[S1] Lens is a deep-tech AI company redefining how large language models think, reason, and interact with the world. Today’s portfolio performance aligns with my core belief: exceptional businesses with enduring growth narratives outperform over time. The robust gains in Alphabet (GOOG, GOOGL) reflect its dominance in digital advertising and accelerating momentum in AI-driven cloud services. "
# path to your prompt audioA
prompt_path = "mark.mp3"

# generate, supplying the prompt path
output = model.generate(
    text,
    audio_prompt = prompt_path,
    verbose=True, 
)

model.save_audio("lens/test1.mp3", output)

And yes the inference was a bit faster. But I'm interested in using the quantisized versions you uploded. So basically I need to build a new engine right in order to use them, cause I want to use Dia for streaming, I have tested the mlx quantisized and is much faster, but not fast enough for streaming? Also I tried the mmwillet but is slower the 4 bit one. Here is the output if you're interested:

$ ./build/bin/Release/tts-cli.exe \
  --model-path Dia_Q4.gguf \
  --prompt "Today's portfolio performance aligns with my core belief: exceptional businesses with enduring growth narratives outperform over time. The robust gains in Alphabet (GOOG, GOOGL) reflect its dominance in digital advertising and accelerating momentum in AI-driven cloud services." \
  --save-path ./test.wav
Writing audio file: ./test.wav
|======================================|
Num Channels: 1
Num Samples Per Channel: 513536
Sample Rate: 44100
Bit Depth: 16
Length in Seconds: 11.6448
|======================================|
total time = 124307.80 ms

Thank you for your help.

Sign up or log in to comment