The github repo is deleted

#30
by wcy1122 - opened

I just found that the Vibevoice GitHub repo has been deleted. Does anyone know what happens?

There is hope that the final version of 7B might be released soon since the version that was published here was called VibeVoice-Large-Preview if I remember correctly, and they might have decided to remove that preview version to prevent any confusion.

But that's just a guess, and, honestly, if that's the decision they made, they should at least have posted some information about the upcoming release on the now missing pages.

The lack of information is totally unprofessional and we should expect better from such a large and profitable corporation.

Not sure, but I just modified my space to comment out the Large file path on initial load. That way my space isn't just broken. I do wish they would have simply commented saying it's redacted.

There is hope that the final version of 7B might be released soon since the version that was published here was called VibeVoice-Large-Preview if I remember correctly, and they might have decided to remove that preview version to prevent any confusion.

But that's just a guess, and, honestly, if that's the decision they made, they should at least have posted some information about the upcoming release on the now missing pages.

The lack of information is totally unprofessional and we should expect better from such a large and profitable corporation.

I haven't seen any previous Preview-to-Official release process includes privating the entire Github repo. This one is very fishy

Very fishy indeed.

New information I have acquired since my post here: the preview version had already been replaced by a real "final" version of VibeVoice-7b, and that's the version that was available recently before Microsoft removed it. So my little scenario about an upcoming final version really did not make any sense - it was just a guess, and it was wrong.

This comment has been hidden (marked as Resolved)

I put up a fork/reupload of the repo here: https://github.com/vibevoice-community/VibeVoice

Happy to add anyone who wants to contribute as a contributor + merge PRs, and hopefully the official MSFT repo will come back soon :)

Fully agree with you this lack of information is odd.

There is hope that the final version of 7B might be released soon since the version that was published here was called VibeVoice-Large-Preview if I remember correctly, and they might have decided to remove that preview version to prevent any confusion.

But that's just a guess, and, honestly, if that's the decision they made, they should at least have posted some information about the upcoming release on the now missing pages.

The lack of information is totally unprofessional and we should expect better from such a large and profitable corporation.

Giving WizardLM vibes...

Just inference with transformers. I like it better that way, anyway.

!pip install git+https://github.com/huggingface/transformers.git@refs/pull/40546/head
import torch
import numpy as np
import soundfile as sf
from transformers import VibeVoiceForConditionalGenerationInference, VibeVoiceProcessor

# 1. Load Model and Processor
# Using the specified 1.5B checkpoint
model_id = "microsoft/VibeVoice-1.5B"
device = "cuda" if torch.cuda.is_available() else "cpu"
dtype = torch.bfloat16 if torch.cuda.is_available() and torch.cuda.is_bf16_supported() else torch.float32

model = VibeVoiceForConditionalGenerationInference.from_pretrained(
    model_id,
    torch_dtype=dtype,
    device_map=device
)
processor = VibeVoiceProcessor.from_pretrained(model_id)

# 2. Prepare Inputs
# Provide the conversational script.
script = """
Speaker 1: VibeVoice integrates seamlessly into the Transformers library.
Speaker 2: Yes, this makes it incredibly easy to use. We can just load the processor and model from the Hub.
Speaker 1: Exactly. Then we prepare the text script and provide paths to our voice samples.
Speaker 2: And finally, call the generate method. It's that simple.
"""

voice_sample_paths = ["audio.wav", "audio1.wav"]

# The processor combines the text and audio into the format required by the model.
inputs = processor(
    text=[script],
    voice_samples=[voice_sample_paths],
    return_tensors="pt",
    padding=True,
)

# Move inputs to the correct device
inputs = {key: val.to(device) if isinstance(val, torch.Tensor) else val for key, val in inputs.items()}


# 3. Generate Audio
# FIX: Pass the tokenizer explicitly to the generate method.
output = model.generate(
    **inputs,
    tokenizer=processor.tokenizer,  # This was the missing argument
    cfg_scale=1.3,
    max_new_tokens=None,
)

# 4. Save the Output
# The output contains the generated audio in the `speech_outputs` attribute.
generated_speech = output.speech_outputs[0]
# Use the sampling rate from the processor's configuration
processor_sampling_rate = processor.audio_processor.sampling_rate

processor.save_audio(generated_speech, "generated_podcast_1.5B.wav", sampling_rate=processor_sampling_rate)

print("Audio saved to generated_podcast_1.5B.wav")

Was the streaming version released and did anyone get a copy of that model?

Here's my fork that was used for the merged PR of MPS support in the demos: MPS patch
For some reason it says mine was forked from another user but it was from Microsoft directly. Can't wait to hear what happened.

No, it was not, but it may still be in the future.

There are various copies of the large model on the Hub. I also believe it was never taken down from ModelScope.

Nice, also forked the repo here:
GH: https://github.com/vibevoice-community/VibeVoice/
HF: https://huggingface.co/vibevoice
Happy to merge any PRs or add people as contributors/collaborators :)

I love you, Mr. Fake!

For everyone trying to find the now-deleted resources, I've managed to track down a few working links. Here's what the community has preserved:

GitHub Code (Pre-Deletion Backup): The full source code from the official repo before it was taken down is mirrored here. It's deployable:
https://github.com/shijincai/VibeVoice

7B Model Weights (ModelScope): The complete 7B model weights are available for download from ModelScope:
https://modelscope.cn/models/microsoft/VibeVoice-Large

Live Online Demo: If you just want to test the model's capabilities without the hassle of a local setup, someone has put up a direct online service here:
https://vibevoice.info/

Hope this helps everyone out!

Awesome everyone!

I have a space here as well with a little different flavor. Script generator later today that auto formats speaker tags and generates scripts with a single prompt.

https://huggingface.co/spaces/ACloudCenter/Conference-Generator-VibeVoice

Looks like the repo is back up: https://github.com/microsoft/VibeVoice

Looks like the repo is back up: https://github.com/microsoft/VibeVoice

But the code is not

VibeVoice is an open-source research framework intended to advance collaboration in the speech synthesis community. After release, we discovered instances where the tool was used in ways inconsistent with the stated intent. Since responsible use of AI is one of Microsoft’s guiding principles, we have disabled this repo until we are confident that out-of-scope use is no longer possible.

It's funny that they just deleted some simple wrapper code and then states that this thing is "disabled", I mean, you didn't prevent or stop anything...

yup. Oh well. ce la vie

Just inference with transformers. I like it better that way, anyway.

!pip install git+https://github.com/huggingface/transformers.git@refs/pull/40546/head

...snipped...
processor.save_audio(generated_speech, "generated_podcast_1.5B.wav", sampling_rate=processor_sampling_rate)
print("Audio saved to generated_podcast_1.5B.wav")


since it works via transformers , it is not clear exactly how they intend to put the cat back in the bag

Also since it is MIT licensed...

It's an unfortunate censorship imposed by God-knows-who.

It won't work, but it sucks nevertheless.

Ehhh. It's just bizarre, but I think we're all set at this point. Once you let the rabbit out of the hat you can't put it back in. Microsoft would do well to have an Open Source Developer Advocate that can handle these types of issues before they begin and communicate with the community. It was definitely a different team publishing this and it was clear from the PRs that they were unfamiliar with publishing to HF.

EDIT- Went back and read through their responsible usage and fair point about MIT licensing from PsiPi. I've become too invested at this point lol. At least we're all back up and working now.

"To mitigate the risks of misuse, we have:
Embedded an audible disclaimer (e.g. “This segment was generated by AI”) automatically into every synthesized audio file.
Added an imperceptible watermark to generated audio so third parties can verify VibeVoice provenance. Please see contact information at the end of this model card.
Logged inference requests (hashed) for abuse pattern detection and publishing aggregated statistics quarterly.
Users are responsible for sourcing their datasets legally and ethically. This may include securing appropriate rights and/or anonymizing data prior to use with VibeVoice. Users are reminded to be mindful of data privacy concerns."

"If the team receives reports of undesired behavior or identifies issues independently, we will update this repository with appropriate mitigations. "

fwiw. They have such policies, feels a lot like you are teaching egg-blowing classes for multibillionaire corporate grannies.
There, was that sufficiently abstractly hostile in an "internet" way? I never can tell. :)

saves anyone else saying similar things, hopefullly

Fair point. Let me edit the mention about the gating MIT.

Fair point. Let me edit the mention about the gating MIT.

I did the same just leaving the now even more abstract faux hater comment to utterly baffle future haters.

Probably a win

There are a few reuploads of the source code repository, the large weights are still up on ModelScope, there is probably no point putting the genie back in the bottle. And the weights and inference code are MIT anyway.

This comment has been hidden (marked as Resolved)

Sign up or log in to comment