How to use the AFWhisper to extract audio features
#2
by
GuoGuoBan
- opened
Thank you for your contribution. But how can I use the AFWhisper to extract audio features?
You can load the weights using Whisper and do exactly the same steps you do with Whisper.
You can load the weights using Whisper and do exactly the same steps you do with Whisper.
I am using transformer package to load Whisper model weights as the link https://huggingface.co/openai/whisper-large-v3. Could you provide an example to how to load your model weights, please! Thank you very much!
I believe it is something like this:
import librosa
from transformers import WhisperProcessor
from transformers.models.whisper.modeling_whisper import WhisperEncoder
encoder = WhisperEncoder.from_pretrained("nvidia/audio-flamingo-3", subfolder="sound_tower")
processor = WhisperProcessor.from_pretrained("openai/whisper-large-v3")
audio_array, sampling_rate = librosa.load("test.mp3", sr=16000)
inputs = processor(audio_array, sampling_rate=16000, return_tensors="pt")
encoder_outputs = encoder(inputs.input_features)
print(f"Encoder output shape: {encoder_outputs.last_hidden_state.shape}")
I believe it is something like this:
import librosa from transformers import WhisperProcessor from transformers.models.whisper.modeling_whisper import WhisperEncoder encoder = WhisperEncoder.from_pretrained("nvidia/audio-flamingo-3", subfolder="sound_tower") processor = WhisperProcessor.from_pretrained("openai/whisper-large-v3") audio_array, sampling_rate = librosa.load("test.mp3", sr=16000) inputs = processor(audio_array, sampling_rate=16000, return_tensors="pt") encoder_outputs = encoder(inputs.input_features) print(f"Encoder output shape: {encoder_outputs.last_hidden_state.shape}")
It works for me. Thank you!