Steven Zheng's picture

Steven Zheng PRO

Steveeeeeeen

·

AI & ML interests

speech & audio

Recent Activity

updated a dataset 1 day ago

Steveeeeeeen/whisper-leaderboard-evals

updated a Space 3 days ago

Steveeeeeeen/open_asr_leaderboard_longform

updated a dataset 3 days ago

Steveeeeeeen/leaderboard_longform

View all activity

Organizations

upvoted an article 5 days ago

Article

Make your ZeroGPU Spaces go brrr with PyTorch ahead-of-time compilation

By

and 3 others •

7 days ago

• 44

upvoted a paper 10 days ago

TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Language Modeling

Paper • 2508.16790 • Published 17 days ago • 7

upvoted an article 15 days ago

Article

Vision Language Model Alignment in TRL ⚡️

By

and 4 others •

Aug 7

• 78

upvoted a collection 20 days ago

mEUltilingual speechLLM projectors

Multilingual projectors trained with SLAM-ASR for EU languages. • 1 item • Updated Jul 10 • 5

upvoted a paper 21 days ago

Overcoming Data Scarcity in Multi-Dialectal Arabic ASR via Whisper Fine-Tuning

Paper • 2506.02627 • Published Jun 3 • 2

upvoted an article about 2 months ago

Article

How to Run a Hugging Face Model in JAX (Part 1)

By

•

Jul 20

• 23

upvoted 3 articles 2 months ago

Article

How Much Power does a SOTA Open Video Model Use? ⚡🎥

By

and 2 others •

Jul 2

• 15

Article

Gemma 3n fully available in the open-source ecosystem!

By

and 7 others •

Jun 26

• 116

Article

Common Pitfalls in Sharing Open Source Models on Hugging Face (and How to Dodge Them)

By

and 2 others •

Jul 1

• 21

upvoted 2 papers 3 months ago

SmolVLA: A Vision-Language-Action Model for Affordable and Efficient Robotics

Paper • 2506.01844 • Published Jun 2 • 131

LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models

Paper • 2505.19223 • Published May 25 • 8

upvoted a collection 4 months ago

Releases 23 May

34 items • Updated May 26 • 8

upvoted a paper 4 months ago

This Time is Different: An Observability Perspective on Time Series Foundation Models

Paper • 2505.14766 • Published May 20 • 40

upvoted an article 4 months ago

Article

NVIDIA Cosmos Now Available On Hugging Face For Physical AI Reasoning

By

and 1 other •

May 19

• 26

upvoted 3 papers 4 months ago

Qwen3 Technical Report

Paper • 2505.09388 • Published May 14 • 287

The Audio-Visual BatVision Dataset for Research on Sight and Sound

Paper • 2303.07257 • Published Mar 13, 2023 • 1

NeRAF: 3D Scene Infused Neural Radiance and Acoustic Fields

Paper • 2405.18213 • Published May 28, 2024 • 1

upvoted an article 4 months ago

Article

Falcon-Edge: A series of powerful, universal, fine-tunable 1.58bit language models.

By

and 9 others •

May 15

• 36

upvoted a paper 4 months ago

Qwen2.5-Omni Technical Report

Paper • 2503.20215 • Published Mar 26 • 165

upvoted an article 4 months ago

Article

Blazingly fast whisper transcriptions with Inference Endpoints

By

and 5 others •

May 13

• 75