arxiv:1905.11235

CIF: Continuous Integrate-and-Fire for End-to-End Speech Recognition

Published on May 27, 2019

Authors:

Abstract

A novel continuous integrate-and-fire mechanism for sequence transduction in ASR achieves competitive performance, including a new state-of-the-art WER on Mandarin telephone ASR.

AI-generated summary

In this paper, we propose a novel soft and monotonic alignment mechanism used for sequence transduction. It is inspired by the integrate-and-fire model in spiking neural networks and employed in the encoder-decoder framework consists of continuous functions, thus being named as: Continuous Integrate-and-Fire (CIF). Applied to the ASR task, CIF not only shows a concise calculation, but also supports online recognition and acoustic boundary positioning, thus suitable for various ASR scenarios. Several support strategies are also proposed to alleviate the unique problems of CIF-based model. With the joint action of these methods, the CIF-based model shows competitive performance. Notably, it achieves a word error rate (WER) of 2.86% on the test-clean of Librispeech and creates new state-of-the-art result on Mandarin telephone ASR benchmark.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/1905.11235 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/1905.11235 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/1905.11235 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.