HAR - a haetsal-lee Collection

Models
Datasets
Spaces
Docs
Enterprise
Pricing
Log In
Sign Up

haetsal-lee 's Collections

HAR

HAR

updated 5 days ago

Adaptive Hyper-Graph Convolution Network for Skeleton-based Human Action Recognition with Virtual Connections

Paper • 2411.14796 • Published Nov 22, 2024
LLaVAction: evaluating and training multi-modal large language models for action recognition

Paper • 2503.18712 • Published Mar 24 • 3
FROSTER: Frozen CLIP Is A Strong Teacher for Open-Vocabulary Action Recognition

Paper • 2402.03241 • Published Feb 5, 2024
Leveraging Temporal Contextualization for Video Action Recognition

Paper • 2404.09490 • Published Apr 15, 2024
Collaboratively Self-supervised Video Representation Learning for Action Recognition

Paper • 2401.07584 • Published Jan 15, 2024 • 1
TASAR: Transfer-based Attack on Skeletal Action Recognition

Paper • 2409.02483 • Published Sep 4, 2024 • 4
SkateFormer: Skeletal-Temporal Transformer for Human Action Recognition

Paper • 2403.09508 • Published Mar 14, 2024
Advancing Human Action Recognition with Foundation Models trained on Unlabeled Public Videos

Paper • 2402.08875 • Published Feb 14, 2024
ActionHub: A Large-scale Action Video Description Dataset for Zero-shot Action Recognition

Paper • 2401.11654 • Published Jan 22, 2024
Referring Atomic Video Action Recognition

Paper • 2407.01872 • Published Jul 2, 2024
SA-DVAE: Improving Zero-Shot Skeleton-Based Action Recognition by Disentangled Variational Autoencoders

Paper • 2407.13460 • Published Jul 18, 2024
Gate-Shift-Pose: Enhancing Action Recognition in Sports with Skeleton Information

Paper • 2503.04470 • Published Mar 6
Revealing Key Details to See Differences: A Novel Prototypical Perspective for Skeleton-based Action Recognition

Paper • 2411.18941 • Published Nov 28, 2024
CHASE: Learning Convex Hull Adaptive Shift for Skeleton-based Multi-Entity Action Recognition

Paper • 2410.07153 • Published Oct 9, 2024
SkeletonX: Data-Efficient Skeleton-based Action Recognition via Cross-sample Feature Aggregation

Paper • 2504.11749 • Published Apr 16
EPAM-Net: An Efficient Pose-driven Attention-guided Multimodal Network for Video Action Recognition

Paper • 2408.05421 • Published Aug 10, 2024 • 1
SeFAR: Semi-supervised Fine-grained Action Recognition with Temporal Perturbation and Learning Stabilization

Paper • 2501.01245 • Published Jan 2 • 5
SpikMamba: When SNN meets Mamba in Event-based Human Action Recognition

Paper • 2410.16746 • Published Oct 22, 2024
Prototypical Calibrating Ambiguous Samples for Micro-Action Recognition

Paper • 2412.14719 • Published Dec 19, 2024 • 1
AR-Net: Adaptive Frame Resolution for Efficient Action Recognition

Paper • 2007.15796 • Published Jul 31, 2020
Exploring Ordinal Bias in Action Recognition for Instructional Videos

Paper • 2504.06580 • Published Apr 9 • 1
MotionLLM: Understanding Human Behaviors from Human Motions and Videos

Paper • 2405.20340 • Published May 30, 2024 • 21
ST-LLM: Large Language Models Are Effective Temporal Learners

Paper • 2404.00308 • Published Mar 30, 2024 • 8
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding

Paper • 2404.05726 • Published Apr 8, 2024 • 23
M-LLM Based Video Frame Selection for Efficient Video Understanding

Paper • 2502.19680 • Published Feb 27
MotionBank: A Large-scale Video Motion Benchmark with Disentangled Rule-based Annotations

Paper • 2410.13790 • Published Oct 17, 2024
LongVLM: Efficient Long Video Understanding via Large Language Models

Paper • 2404.03384 • Published Apr 4, 2024
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM

Paper • 2403.18406 • Published Mar 27, 2024 • 1
VidComposition: Can MLLMs Analyze Compositions in Compiled Videos?

Paper • 2411.10979 • Published Nov 17, 2024
Vision as LoRA

Paper • 2503.20680 • Published Mar 26 • 3
GPT-4o: Visual perception performance of multimodal large language models in piglet activity understanding

Paper • 2406.09781 • Published Jun 14, 2024
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models

Paper • 2407.15841 • Published Jul 22, 2024 • 41
LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation

Paper • 2411.04997 • Published Nov 7, 2024 • 40
Latent Action Pretraining from Videos

Paper • 2410.11758 • Published Oct 15, 2024 • 3
VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval

Paper • 2412.01558 • Published Dec 2, 2024 • 4
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs

Paper • 2504.00072 • Published Mar 31 • 7
Sparse Attention Vectors: Generative Multimodal Model Features Are Discriminative Vision-Language Classifiers

Paper • 2412.00142 • Published Nov 28, 2024 • 4
LLMs Meet Long Video: Advancing Long Video Comprehension with An Interactive Visual Adapter in LLMs

Paper • 2402.13546 • Published Feb 21, 2024 • 3
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Paper • 2405.21075 • Published May 31, 2024 • 25
HoliTom: Holistic Token Merging for Fast Video Large Language Models

Paper • 2505.21334 • Published May 27 • 19
AutoMMLab: Automatically Generating Deployable Models from Language Instructions for Computer Vision Tasks

Paper • 2402.15351 • Published Feb 23, 2024
Breaking the Encoder Barrier for Seamless Video-Language Understanding

Paper • 2503.18422 • Published Mar 24
Item-Language Model for Conversational Recommendation

Paper • 2406.02844 • Published Jun 5, 2024 • 12
OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding

Paper • 2406.19389 • Published Jun 27, 2024 • 55
Prompt Learning for Action Recognition

Paper • 2305.12437 • Published May 21, 2023
HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models

Paper • 2502.20811 • Published Feb 28 • 3
Visual Perception by Large Language Model's Weights

Paper • 2405.20339 • Published May 30, 2024
MotionSight: Boosting Fine-Grained Motion Understanding in Multimodal LLMs

Paper • 2506.01674 • Published 29 days ago • 27
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding

Paper • 2406.09418 • Published Jun 13, 2024 • 1
Beyond Binary: Towards Fine-Grained LLM-Generated Text Detection via Role Recognition and Involvement Measurement

Paper • 2410.14259 • Published Oct 18, 2024
One to rule them all: natural language to bind communication, perception and action

Paper • 2411.15033 • Published Nov 22, 2024 • 3
TempCompass: Do Video LLMs Really Understand Videos?

Paper • 2403.00476 • Published Mar 1, 2024 • 1
Token-Efficient Long Video Understanding for Multimodal LLMs

Paper • 2503.04130 • Published Mar 6 • 95
ViDAS: Vision-based Danger Assessment and Scoring

Paper • 2410.00477 • Published Oct 1, 2024
Dense Connector for MLLMs

Paper • 2405.13800 • Published May 22, 2024 • 25
TC-LLaVA: Rethinking the Transfer from Image to Video Understanding with Temporal Considerations

Paper • 2409.03206 • Published Sep 5, 2024
Policy Improvement using Language Feedback Models

Paper • 2402.07876 • Published Feb 12, 2024 • 9
Interaction-Aware Prompting for Zero-Shot Spatio-Temporal Action Detection

Paper • 2304.04688 • Published Apr 10, 2023 • 1
LLM4VG: Large Language Models Evaluation for Video Grounding

Paper • 2312.14206 • Published Dec 21, 2023 • 4

Collection guide
Browse collections

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs