Wav2Vec2-Conformer-Large-960h with Relative Position Embeddings + 4-gram
This model is identical to Facebook's wav2vec2-conformer-rel-pos-large-960h-ft, but is
augmented with an English 4-gram. The 4-gram.arpa.gz
of Librispeech's official ngrams is used.
Evaluation
This code snippet shows how to evaluate patrickvonplaten/wav2vec2-conformer-rel-pos-large-960h-ft-4-gram on LibriSpeech's "clean" and "other" test data.
from datasets import load_dataset
from transformers import AutoModelForCTC, AutoProcessor
import torch
from jiwer import wer
model_id = "patrickvonplaten/wav2vec2-conformer-rel-pos-large-960h-ft-4-gram"
librispeech_eval = load_dataset("librispeech_asr", "other", split="test")
model = AutoModelForCTC.from_pretrained(model_id).to("cuda")
processor = AutoProcessor.from_pretrained(model_id)
def map_to_pred(batch):
inputs = processor(batch["audio"]["array"], sampling_rate=16_000, return_tensors="pt")
inputs = {k: v.to("cuda") for k,v in inputs.items()}
with torch.no_grad():
logits = model(**inputs).logits
transcription = processor.batch_decode(logits.cpu().numpy()).text[0]
batch["transcription"] = transcription
return batch
result = librispeech_eval.map(map_to_pred, remove_columns=["audio"])
print(wer(result["text"], result["transcription"]))
Result (WER):
"clean" | "other" |
---|---|
1.94 | 3.54 |
- Downloads last month
- 3
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.
Dataset used to train patrickvonplaten/wav2vec2-conformer-rel-pos-large-960h-ft-4-gram
Evaluation results
- Test WER on LibriSpeech (clean)test set self-reported1.940
- Test WER on LibriSpeech (other)test set self-reported3.540