microsoft/VibeVoice-1.5B · The github repo is deleted

5 days ago

I just found that the Vibevoice GitHub repo has been deleted. Does anyone know what happens?

5 days ago

There is hope that the final version of 7B might be released soon since the version that was published here was called VibeVoice-Large-Preview if I remember correctly, and they might have decided to remove that preview version to prevent any confusion.

But that's just a guess, and, honestly, if that's the decision they made, they should at least have posted some information about the upcoming release on the now missing pages.

The lack of information is totally unprofessional and we should expect better from such a large and profitable corporation.

ACloudCenter

5 days ago

Not sure, but I just modified my space to comment out the Large file path on initial load. That way my space isn't just broken. I do wish they would have simply commented saying it's redacted.

arnrightnow

5 days ago

There is hope that the final version of 7B might be released soon since the version that was published here was called VibeVoice-Large-Preview if I remember correctly, and they might have decided to remove that preview version to prevent any confusion.

But that's just a guess, and, honestly, if that's the decision they made, they should at least have posted some information about the upcoming release on the now missing pages.

The lack of information is totally unprofessional and we should expect better from such a large and profitable corporation.

I haven't seen any previous Preview-to-Official release process includes privating the entire Github repo. This one is very fishy

augmentedrealitycat

5 days ago

Very fishy indeed.

New information I have acquired since my post here: the preview version had already been replaced by a real "final" version of VibeVoice-7b, and that's the version that was available recently before Microsoft removed it. So my little scenario about an upcoming final version really did not make any sense - it was just a guess, and it was wrong.

PsiPi

4 days ago

This comment has been hidden (marked as Resolved)

mrfakename

4 days ago

I put up a fork/reupload of the repo here: https://github.com/vibevoice-community/VibeVoice

Happy to add anyone who wants to contribute as a contributor + merge PRs, and hopefully the official MSFT repo will come back soon :)

DigitalSpaceport

4 days ago

•

edited 4 days ago

Fully agree with you this lack of information is odd.

There is hope that the final version of 7B might be released soon since the version that was published here was called VibeVoice-Large-Preview if I remember correctly, and they might have decided to remove that preview version to prevent any confusion.

But that's just a guess, and, honestly, if that's the decision they made, they should at least have posted some information about the upcoming release on the now missing pages.

The lack of information is totally unprofessional and we should expect better from such a large and profitable corporation.

mrfakename

4 days ago

Giving WizardLM vibes...

urroxyz

4 days ago

•

edited 4 days ago

Just inference with transformers. I like it better that way, anyway.

!pip install git+https://github.com/huggingface/transformers.git@refs/pull/40546/head

import torch
import numpy as np
import soundfile as sf
from transformers import VibeVoiceForConditionalGenerationInference, VibeVoiceProcessor

# 1. Load Model and Processor
# Using the specified 1.5B checkpoint
model_id = "microsoft/VibeVoice-1.5B"
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.bfloat16 if torch.cuda.is_available() and torch.cuda.is_bf16_supported() else torch.float32

model = VibeVoiceForConditionalGenerationInference.from_pretrained(
    model_id,
    torch_dtype=dtype,
    device_map=device
)
processor = VibeVoiceProcessor.from_pretrained(model_id)

# 2. Prepare Inputs
# Provide the conversational script.
script = """
Speaker 1: VibeVoice integrates seamlessly into the Transformers library.
Speaker 2: Yes, this makes it incredibly easy to use. We can just load the processor and model from the Hub.
Speaker 1: Exactly. Then we prepare the text script and provide paths to our voice samples.
Speaker 2: And finally, call the generate method. It's that simple.
"""

voice_sample_paths = ["audio.wav", "audio1.wav"]

# The processor combines the text and audio into the format required by the model.
inputs = processor(
    text=[script],
    voice_samples=[voice_sample_paths],
    return_tensors="pt",
    padding=True,
)

# Move inputs to the correct device
inputs = {key: val.to(device) if isinstance(val, torch.Tensor) else val for key, val in inputs.items()}


# 3. Generate Audio
# FIX: Pass the tokenizer explicitly to the generate method.
output = model.generate(
    **inputs,
    tokenizer=processor.tokenizer,  # This was the missing argument
    cfg_scale=1.3,
    max_new_tokens=None,
)

# 4. Save the Output
# The output contains the generated audio in the `speech_outputs` attribute.
generated_speech = output.speech_outputs[0]
# Use the sampling rate from the processor's configuration
processor_sampling_rate = processor.audio_processor.sampling_rate

processor.save_audio(generated_speech, "generated_podcast_1.5B.wav", sampling_rate=processor_sampling_rate)

print("Audio saved to generated_podcast_1.5B.wav")

gregory-fanous

4 days ago

•

edited 1 day ago

Was the streaming version released and did anyone get a copy of that model?

Here's my fork that was used for the merged PR of MPS support in the demos: MPS patch
For some reason it says mine was forked from another user but it was from Microsoft directly. Can't wait to hear what happened.

urroxyz

4 days ago

•

edited 4 days ago

No, it was not, but it may still be in the future.

There are various copies of the large model on the Hub. I also believe it was never taken down from ModelScope.

mrfakename

4 days ago

•

edited 4 days ago

Nice, also forked the repo here:
GH: https://github.com/vibevoice-community/VibeVoice/
HF: https://huggingface.co/vibevoice
Happy to merge any PRs or add people as contributors/collaborators :)

urroxyz

4 days ago

I love you, Mr. Fake!

Timvov

4 days ago

For everyone trying to find the now-deleted resources, I've managed to track down a few working links. Here's what the community has preserved:

GitHub Code (Pre-Deletion Backup): The full source code from the official repo before it was taken down is mirrored here. It's deployable:
https://github.com/shijincai/VibeVoice

7B Model Weights (ModelScope): The complete 7B model weights are available for download from ModelScope:
https://modelscope.cn/models/microsoft/VibeVoice-Large

Live Online Demo: If you just want to test the model's capabilities without the hassle of a local setup, someone has put up a direct online service here:
https://vibevoice.info/

Hope this helps everyone out!

ACloudCenter

4 days ago

Awesome everyone!

I have a space here as well with a little different flavor. Script generator later today that auto formats speaker tags and generates scripts with a single prompt.

https://huggingface.co/spaces/ACloudCenter/Conference-Generator-VibeVoice

gaieges

3 days ago

Looks like the repo is back up: https://github.com/microsoft/VibeVoice

mrfakename

3 days ago

Looks like the repo is back up: https://github.com/microsoft/VibeVoice

But the code is not

VibeVoice is an open-source research framework intended to advance collaboration in the speech synthesis community. After release, we discovered instances where the tool was used in ways inconsistent with the stated intent. Since responsible use of AI is one of Microsoft’s guiding principles, we have disabled this repo until we are confident that out-of-scope use is no longer possible.

arnrightnow

3 days ago

•

edited 3 days ago

It's funny that they just deleted some simple wrapper code and then states that this thing is "disabled", I mean, you didn't prevent or stop anything...

PsiPi

3 days ago

yup. Oh well. ce la vie

PsiPi

3 days ago

Just inference with transformers. I like it better that way, anyway.
!pip install git+https://github.com/huggingface/transformers.git@refs/pull/40546/head
...snipped...
processor.save_audio(generated_speech, "generated_podcast_1.5B.wav", sampling_rate=processor_sampling_rate)
print("Audio saved to generated_podcast_1.5B.wav")

since it works via transformers , it is not clear exactly how they intend to put the cat back in the bag

mrfakename

3 days ago

Also since it is MIT licensed...

urroxyz

3 days ago

It's an unfortunate censorship imposed by God-knows-who.

It won't work, but it sucks nevertheless.

ACloudCenter

3 days ago

•

edited 3 days ago

Ehhh. It's just bizarre, but I think we're all set at this point. Once you let the rabbit out of the hat you can't put it back in. Microsoft would do well to have an Open Source Developer Advocate that can handle these types of issues before they begin and communicate with the community. It was definitely a different team publishing this and it was clear from the PRs that they were unfamiliar with publishing to HF.

EDIT- Went back and read through their responsible usage and fair point about MIT licensing from PsiPi. I've become too invested at this point lol. At least we're all back up and working now.

"To mitigate the risks of misuse, we have:
Embedded an audible disclaimer (e.g. “This segment was generated by AI”) automatically into every synthesized audio file.
Added an imperceptible watermark to generated audio so third parties can verify VibeVoice provenance. Please see contact information at the end of this model card.
Logged inference requests (hashed) for abuse pattern detection and publishing aggregated statistics quarterly.
Users are responsible for sourcing their datasets legally and ethically. This may include securing appropriate rights and/or anonymizing data prior to use with VibeVoice. Users are reminded to be mindful of data privacy concerns."

"If the team receives reports of undesired behavior or identifies issues independently, we will update this repository with appropriate mitigations. "

PsiPi

3 days ago

•

edited 3 days ago

fwiw. They have such policies, feels a lot like you are teaching egg-blowing classes for multibillionaire corporate grannies.
There, was that sufficiently abstractly hostile in an "internet" way? I never can tell. :)

saves anyone else saying similar things, hopefullly

ACloudCenter

3 days ago

Fair point. Let me edit the mention about the gating MIT.

PsiPi

3 days ago

Fair point. Let me edit the mention about the gating MIT.

I did the same just leaving the now even more abstract faux hater comment to utterly baffle future haters.

Probably a win

theo77186

3 days ago

There are a few reuploads of the source code repository, the large weights are still up on ModelScope, there is probably no point putting the genie back in the bottle. And the weights and inference code are MIT anyway.

gregory-fanous

1 day ago

This comment has been hidden (marked as Resolved)