SoccerChat: Integrating Multimodal Data for Enhanced Soccer Game Understanding
Abstract
SoccerChat, a multimodal conversational AI framework, enhances soccer video comprehension by integrating visual and textual data, improving action classification and referee decision-making.
The integration of artificial intelligence in sports analytics has transformed soccer video understanding, enabling real-time, automated insights into complex game dynamics. Traditional approaches rely on isolated data streams, limiting their effectiveness in capturing the full context of a match. To address this, we introduce SoccerChat, a multimodal conversational AI framework that integrates visual and textual data for enhanced soccer video comprehension. Leveraging the extensive SoccerNet dataset, enriched with jersey color annotations and automatic speech recognition (ASR) transcripts, SoccerChat is fine-tuned on a structured video instruction dataset to facilitate accurate game understanding, event classification, and referee decision making. We benchmark SoccerChat on action classification and referee decision-making tasks, demonstrating its performance in general soccer event comprehension while maintaining competitive accuracy in referee decision making. Our findings highlight the importance of multimodal integration in advancing soccer analytics, paving the way for more interactive and explainable AI-driven sports analysis. https://github.com/simula/SoccerChat
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Commentary Generation for Soccer Highlights (2025)
- SoccerHigh: A Benchmark Dataset for Automatic Soccer Video Summarization (2025)
- A Survey on Video Temporal Grounding with Multimodal Large Language Model (2025)
- FineBadminton: A Multi-Level Dataset for Fine-Grained Badminton Video Understanding (2025)
- CVBench: Evaluating Cross-Video Synergies for Complex Multimodal Understanding and Reasoning (2025)
- Video2Roleplay: A Multimodal Dataset and Framework for Video-Guided Role-playing Agents (2025)
- SoccerTrack v2: A Full-Pitch Multi-View Soccer Dataset for Game State Reconstruction (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 1
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper