---
title: Real-time Speech Transcription
emoji: 🎙️
colorFrom: blue
colorTo: purple
sdk: gradio
sdk_version: 4.19.2
app_file: gradio_app.py
pinned: false
---

# Real-time Transcription with FastRTC

This project implements real-time audio transcription using FastRTC and Gradio, deployed on Hugging Face Spaces.

## Features

- Real-time audio transcription
- Voice Activity Detection (VAD)
- Web-based interface using Gradio
- Deployed on Hugging Face Spaces

## Prerequisites

- Python 3.8 or higher
- Hugging Face account and token
- Git

## Setup

1. Clone the repository:
```bash
git clone <your-repo-url>
cd realtime-transcription-fastrtc
```

2. Create a `.env` file with your Hugging Face credentials:
```
HUGGINGFACE_TOKEN=your_token_here
HUGGINGFACE_USERNAME=your_username_here
```

3. Install dependencies:
```bash
pip install -r requirements.txt
```

## Deployment

1. Make sure you have set up your `.env` file with the required credentials.

2. Run the deployment script:
```bash
python deploy.py
```

The script will:
- Check for required environment variables
- Install dependencies
- Log in to Hugging Face
- Create a new Space
- Deploy your application

3. Once deployed, your application will be available at:
```
https://huggingface.co/spaces/<your-username>/realtime-transcription
```

## Local Development

To run the application locally:

```bash
python app.py
```

The application will be available at `http://localhost:7860`

## Troubleshooting

If you encounter any issues during deployment:

1. Check that your Hugging Face token is valid and has the necessary permissions
2. Ensure all dependencies are installed correctly
3. Verify that your `.env` file contains the correct credentials
4. Check the Hugging Face Spaces logs for any deployment errors

## License

This project is licensed under the MIT License - see the LICENSE file for details.

## Technical Details
- Uses FastRTC for WebRTC streaming
- Powered by Whisper large-v3-turbo model
- Voice Activity Detection for optimal transcription
- FastAPI backend with WebSocket support

## Environment Variables
The following environment variables can be configured:
- `MODEL_ID`: Hugging Face model ID (default: "openai/whisper-large-v3-turbo")
- `APP_MODE`: Set to "deployed" for Hugging Face Spaces
- `UI_MODE`: Set to "fastapi" for the custom UI

## Credits
- [FastRTC](https://fastrtc.org/) for WebRTC streaming
- [Whisper](https://github.com/openai/whisper) for speech recognition
- [Hugging Face](https://huggingface.co/) for model hosting

**System Requirements**
- python >= 3.10
- ffmpeg

## Installation

### Step 1: Clone the repository
```bash
git clone https://github.com/sofi444/realtime-transcription-fastrtc
cd realtime-transcription-fastrtc
```

### Step 2: Set up environment
Choose your preferred package manager:

<details>
<summary>📦 Using UV (recommended)</summary>

[Install `uv`](https://docs.astral.sh/uv/getting-started/installation/)


```bash
uv venv --python 3.11 && source .venv/bin/activate
uv pip install -r requirements.txt
```
</details>

<details>
<summary>🐍 Using pip</summary>

```bash
python -m venv .venv && source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt
```
</details>

### Step 3: Install ffmpeg
<details>
<summary>🍎 macOS</summary>

```bash
brew install ffmpeg
```
</details>

<details>
<summary>🐧 Linux (Ubuntu/Debian)</summary>

```bash
sudo apt update
sudo apt install ffmpeg
```
</details>

### Step 4: Configure environment
Create a `.env` file in the project root:

```env
UI_MODE=fastapi
APP_MODE=local
SERVER_NAME=localhost
```

- **UI_MODE**: controls the interface to use. If set to `gradio`, you will launch the app via Gradio and use their default UI. If set to anything else (eg. `fastapi`) it will use the `index.html` file in the root directory to create the UI (you can customise it as you want) (default `fastapi`).
- **APP_MODE**: ignore this if running only locally. If you're deploying eg. in Spaces, you need to configure a Turn Server. In that case, set it to `deployed`, follow the instructions [here](https://fastrtc.org/deployment/) (default `local`).
- **MODEL_ID**: HF model identifier for the ASR model you want to use (see [here](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&sort=trending)) (default `openai/whisper-large-v3-turbo`)
- **SERVER_NAME**: Host to bind to (default `localhost`)
- **PORT**: Port number (default `7860`) 

### Step 5: Launch the application
```bash
python main.py
```
click on the url that pops up (eg. https://localhost:7860) to start using the app!


### Whisper

Choose the Whisper model version you want to use. See all [here](https://huggingface.co/models?pipeline_tag=automatic-speech-recognition&sort=trending&search=whisper) - you can of course also use a non-Whisper ASR model.

On MPS, I can run `whisper-large-v3-turbo` without problems. This is my current favourite as it's lightweight, performant and multi-lingual!

Adjust the parameters as you like, but remember that for real-time, we want the batch size to be 1 (i.e. start transcribing as soon as a chunk is available).

If you want to transcribe different languages, set the language parameter to the target language, otherwise Whisper defaults to translating to English (even if you set `transcribe` as the task).