Automatic Speech Recognition
Transformers
Safetensors
Welsh
English
wav2vec2
speech
Inference Endpoints

wav2vec2-xlsr-ft-cy-en

An acoustic encoder model for Welsh and English speech recognition, fine-tuned from facebook/wav2vec2-large-xlsr-53 using transcribed spontaneous speech from techiaith/banc-trawsgrifiadau-bangor (v24.01) as well as Welsh and English speech data derived from version 16.1 the Common Voice datasets techiaith/commonvoice_16_1_en_cy

Usage

The wav2vec2-xlsr-ft-cy-en model can be used directly as follows:

import torch
import torchaudio
import librosa

from transformers import Wav2Vec2ForCTC, Wav2Vec2Processor

processor = Wav2Vec2Processor.from_pretrained("techiaith/wav2vec2-xlsr-ft-cy-en")
model = Wav2Vec2ForCTC.from_pretrained("techiaith/wav2vec2-xlsr-ft-cy-en")

audio, rate = librosa.load(audio_file, sr=16000)

inputs = processor(audio, sampling_rate=16_000, return_tensors="pt", padding=True)

with torch.no_grad():
  tlogits = model(inputs.input_values, attention_mask=inputs.attention_mask).logits

# greedy decoding
predicted_ids = torch.argmax(logits, dim=-1)

print("Prediction:", processor.batch_decode(predicted_ids))
Downloads last month
12
Safetensors
Model size
315M params
Tensor type
F32
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Datasets used to train techiaith/wav2vec2-xlsr-53-ft-cy-en

Space using techiaith/wav2vec2-xlsr-53-ft-cy-en 1