10 7 22

Huck Yang

huckiyang

https://huckiyang.github.io/

AI & ML interests

Speech and Language Modeling

Recent Activity

upvoted a paper 17 days ago

4D-RGPT: Toward Region-level 4D Understanding via Perceptual Distillation

new activity about 1 month ago

nvidia/Speech-IQ-leaderboard:Speech IQ Calculator ACL 25

commented on a paper about 2 months ago

Long Grounded Thoughts: Distilling Compositional Visual Reasoning Chains at Scale

View all activity

Organizations

authored 15 papers 3 months ago

Generative Speech Recognition Error Correction with Large Language Models and Task-Activating Prompting

Paper • 2309.15649 • Published Sep 27, 2023 • 1

Conditional Modeling Based Automatic Video Summarization

Paper • 2311.12159 • Published Nov 20, 2023 • 1

Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue

Paper • 2312.15316 • Published Dec 23, 2023

Multimodal Attention Merging for Improved Speech Recognition and Audio Event Classification

Paper • 2312.14378 • Published Dec 22, 2023

GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators

Paper • 2402.06894 • Published Feb 10, 2024 • 1

Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data

Paper • 2409.20007 • Published Sep 30, 2024 • 1

Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks

Paper • 2411.05361 • Published Nov 8, 2024 • 3

Towards Neural Scaling Laws for Time Series Foundation Models

Paper • 2410.12360 • Published Oct 16, 2024

Plan2Align: Predictive Planning Based Test-Time Preference Alignment in Paragraph-Level Machine Translation

Paper • 2502.20795 • Published Feb 28, 2025

Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition

Paper • 2409.09785 • Published Sep 15, 2024

DeSTA2.5-Audio: Toward General-Purpose Large Audio Language Model with Self-Generated Cross-Modal Alignment

Paper • 2507.02768 • Published Jul 3, 2025 • 18

Audio Flamingo 3: Advancing Audio Intelligence with Fully Open Large Audio Language Models

Paper • 2507.08128 • Published Jul 10, 2025 • 10

Estimating Time Series Foundation Model Transferability via In-Context Learning

Paper • 2509.23695 • Published Sep 28, 2025 • 1

Test-Time Scaling Strategies for Generative Retrieval in Multimodal Conversational Recommendations

Paper • 2508.18132 • Published Aug 25, 2025

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

Paper • 2510.15870 • Published Oct 17, 2025 • 89

authored 2 papers about 2 years ago

HyPoradise: An Open Baseline for Generative Speech Recognition with Large Language Models

Paper • 2309.15701 • Published Sep 27, 2023 • 2

Whispering LLaMA: A Cross-Modal Generative Error Correction Framework for Speech Recognition

Paper • 2310.06434 • Published Oct 10, 2023 • 4

authored 2 papers over 2 years ago

Low-rank Adaptation of Large Language Model Rescoring for Parameter-Efficient Speech Recognition

Paper • 2309.15223 • Published Sep 26, 2023 • 22

Voice2Series: Reprogramming Acoustic Models for Time Series Classification

Paper • 2106.09296 • Published Jun 17, 2021

Huck Yang

AI & ML interests

Recent Activity

Organizations

huckiyang's activity