Zhisheng Zheng's picture

19 13

Zhisheng Zheng

zhisheng01

·

https://zhishengzheng.com/

zhisheng147

AI & ML interests

LLM, Speech and Audio Processing

Recent Activity

upvoted a paper 16 days ago

MinMo: A Multimodal Large Language Model for Seamless Voice Interaction

liked a model 27 days ago

deepseek-ai/DeepSeek-V3

liked a model 2 months ago

nyrahealth/CrisperWhisper

View all activity

Organizations

None yet

zhisheng01's activity

upvoted a paper 16 days ago

MinMo: A Multimodal Large Language Model for Seamless Voice Interaction

Paper • 2501.06282 • Published 20 days ago • 42

upvoted a paper 3 months ago

Movie Gen: A Cast of Media Foundation Models

Paper • 2410.13720 • Published Oct 17, 2024 • 91

upvoted 8 papers 4 months ago

F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching

Paper • 2410.06885 • Published Oct 9, 2024 • 43

VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide

Paper • 2410.04364 • Published Oct 6, 2024 • 28

MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion

Paper • 2410.03825 • Published Oct 4, 2024 • 19

MOSEL: 950,000 Hours of Speech Data for Open-Source Speech Foundation Model Training on EU Languages

Paper • 2410.01036 • Published Oct 1, 2024 • 14

From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging

Paper • 2410.01215 • Published Oct 2, 2024 • 30

Training Language Models to Self-Correct via Reinforcement Learning

Paper • 2409.12917 • Published Sep 19, 2024 • 136

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

Paper • 2409.12183 • Published Sep 18, 2024 • 37

Preference Tuning with Human Feedback on Language, Speech, and Vision Tasks: A Survey

Paper • 2409.11564 • Published Sep 17, 2024 • 20

upvoted 2 papers 5 months ago

The VoxCeleb Speaker Recognition Challenge: A Retrospective

Paper • 2408.14886 • Published Aug 27, 2024 • 10

Language Model Can Listen While Speaking

Paper • 2408.02622 • Published Aug 5, 2024 • 39

upvoted 4 papers 7 months ago

Autoregressive Speech Synthesis without Vector Quantization

Paper • 2407.08551 • Published Jul 11, 2024 • 14

Video-to-Audio Generation with Hidden Alignment

Paper • 2407.07464 • Published Jul 10, 2024 • 16

FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs

Paper • 2407.04051 • Published Jul 4, 2024 • 36

PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation

Paper • 2407.02869 • Published Jul 3, 2024 • 18

upvoted a paper 8 months ago

Seed-TTS: A Family of High-Quality Versatile Speech Generation Models

Paper • 2406.02430 • Published Jun 4, 2024 • 33

upvoted 2 papers 11 months ago

EMO: Emote Portrait Alive - Generating Expressive Portrait Videos with Audio2Video Diffusion Model under Weak Conditions

Paper • 2402.17485 • Published Feb 27, 2024 • 191

ChatMusician: Understanding and Generating Music Intrinsically with LLM

Paper • 2402.16153 • Published Feb 25, 2024 • 58