haetsal-lee
's Collections
Adaptive Hyper-Graph Convolution Network for Skeleton-based Human Action
Recognition with Virtual Connections
Paper
•
2411.14796
•
Published
LLaVAction: evaluating and training multi-modal large language models
for action recognition
Paper
•
2503.18712
•
Published
•
3
FROSTER: Frozen CLIP Is A Strong Teacher for Open-Vocabulary Action
Recognition
Paper
•
2402.03241
•
Published
Leveraging Temporal Contextualization for Video Action Recognition
Paper
•
2404.09490
•
Published
Collaboratively Self-supervised Video Representation Learning for Action
Recognition
Paper
•
2401.07584
•
Published
•
1
TASAR: Transfer-based Attack on Skeletal Action Recognition
Paper
•
2409.02483
•
Published
•
4
SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition
Paper
•
2403.09508
•
Published
Advancing Human Action Recognition with Foundation Models trained on
Unlabeled Public Videos
Paper
•
2402.08875
•
Published
ActionHub: A Large-scale Action Video Description Dataset for Zero-shot
Action Recognition
Paper
•
2401.11654
•
Published
Referring Atomic Video Action Recognition
Paper
•
2407.01872
•
Published
SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by
Disentangled Variational Autoencoders
Paper
•
2407.13460
•
Published
Gate-Shift-Pose: Enhancing Action Recognition in Sports with Skeleton
Information
Paper
•
2503.04470
•
Published
Revealing Key Details to See Differences: A Novel Prototypical
Perspective for Skeleton-based Action Recognition
Paper
•
2411.18941
•
Published
CHASE: Learning Convex Hull Adaptive Shift for Skeleton-based
Multi-Entity Action Recognition
Paper
•
2410.07153
•
Published
SkeletonX: Data-Efficient Skeleton-based Action Recognition via
Cross-sample Feature Aggregation
Paper
•
2504.11749
•
Published
EPAM-Net: An Efficient Pose-driven Attention-guided Multimodal Network
for Video Action Recognition
Paper
•
2408.05421
•
Published
•
1
SeFAR: Semi-supervised Fine-grained Action Recognition with Temporal
Perturbation and Learning Stabilization
Paper
•
2501.01245
•
Published
•
5
SpikMamba: When SNN meets Mamba in Event-based Human Action Recognition
Paper
•
2410.16746
•
Published
Prototypical Calibrating Ambiguous Samples for Micro-Action Recognition
Paper
•
2412.14719
•
Published
•
1
AR-Net: Adaptive Frame Resolution for Efficient Action Recognition
Paper
•
2007.15796
•
Published
Exploring Ordinal Bias in Action Recognition for Instructional Videos
Paper
•
2504.06580
•
Published
•
1
MotionLLM: Understanding Human Behaviors from Human Motions and Videos
Paper
•
2405.20340
•
Published
•
21
ST-LLM: Large Language Models Are Effective Temporal Learners
Paper
•
2404.00308
•
Published
•
8
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video
Understanding
Paper
•
2404.05726
•
Published
•
23
M-LLM Based Video Frame Selection for Efficient Video Understanding
Paper
•
2502.19680
•
Published
MotionBank: A Large-scale Video Motion Benchmark with Disentangled
Rule-based Annotations
Paper
•
2410.13790
•
Published
LongVLM: Efficient Long Video Understanding via Large Language Models
Paper
•
2404.03384
•
Published
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering
Using a VLM
Paper
•
2403.18406
•
Published
•
1
VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?
Paper
•
2411.10979
•
Published
Paper
•
2503.20680
•
Published
•
3
GPT-4o: Visual perception performance of multimodal large language
models in piglet activity understanding
Paper
•
2406.09781
•
Published
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language
Models
Paper
•
2407.15841
•
Published
•
41
LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation
Paper
•
2411.04997
•
Published
•
40
Latent Action Pretraining from Videos
Paper
•
2410.11758
•
Published
•
3
VideoLights: Feature Refinement and Cross-Task Alignment Transformer for
Joint Video Highlight Detection and Moment Retrieval
Paper
•
2412.01558
•
Published
•
4
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs
Paper
•
2504.00072
•
Published
•
7
Sparse Attention Vectors: Generative Multimodal Model Features Are
Discriminative Vision-Language Classifiers
Paper
•
2412.00142
•
Published
•
4
LLMs Meet Long Video: Advancing Long Video Comprehension with An
Interactive Visual Adapter in LLMs
Paper
•
2402.13546
•
Published
•
3
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of
Multi-modal LLMs in Video Analysis
Paper
•
2405.21075
•
Published
•
25
HoliTom: Holistic Token Merging for Fast Video Large Language Models
Paper
•
2505.21334
•
Published
•
19
AutoMMLab: Automatically Generating Deployable Models from Language
Instructions for Computer Vision Tasks
Paper
•
2402.15351
•
Published
Breaking the Encoder Barrier for Seamless Video-Language Understanding
Paper
•
2503.18422
•
Published
Item-Language Model for Conversational Recommendation
Paper
•
2406.02844
•
Published
•
12
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and
Understanding
Paper
•
2406.19389
•
Published
•
55
Prompt Learning for Action Recognition
Paper
•
2305.12437
•
Published
HAIC: Improving Human Action Understanding and Generation with Better
Captions for Multi-modal Large Language Models
Paper
•
2502.20811
•
Published
•
3
Visual Perception by Large Language Model's Weights
Paper
•
2405.20339
•
Published
MotionSight: Boosting Fine-Grained Motion Understanding in Multimodal
LLMs
Paper
•
2506.01674
•
Published
•
27
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video
Understanding
Paper
•
2406.09418
•
Published
•
1
Beyond Binary: Towards Fine-Grained LLM-Generated Text Detection via
Role Recognition and Involvement Measurement
Paper
•
2410.14259
•
Published
One to rule them all: natural language to bind communication, perception
and action
Paper
•
2411.15033
•
Published
•
3
TempCompass: Do Video LLMs Really Understand Videos?
Paper
•
2403.00476
•
Published
•
1
Token-Efficient Long Video Understanding for Multimodal LLMs
Paper
•
2503.04130
•
Published
•
95
ViDAS: Vision-based Danger Assessment and Scoring
Paper
•
2410.00477
•
Published
Dense Connector for MLLMs
Paper
•
2405.13800
•
Published
•
25
TC-LLaVA: Rethinking the Transfer from Image to Video Understanding with
Temporal Considerations
Paper
•
2409.03206
•
Published
Policy Improvement using Language Feedback Models
Paper
•
2402.07876
•
Published
•
9
Interaction-Aware Prompting for Zero-Shot Spatio-Temporal Action
Detection
Paper
•
2304.04688
•
Published
•
1
LLM4VG: Large Language Models Evaluation for Video Grounding
Paper
•
2312.14206
•
Published
•
4