This repository contains the model of the paper [ViSpeak: Visual Instruction Feedback in Streaming Videos](https://arxiv.org/abs/2503.12769). Code: https://github.com/HumanMLLM/ViSpeak