Create README.md
Browse files<div align="center">
<h1><a href="https://arxiv.org/abs/2506.03179v1" target="_blank">Vid-SME: Membership Inference Attacks against Large Video Understanding Models</a></h1>
<div>
<a target="_blank" href="https://arxiv.org/abs/2506.03179v1">
<img src="https://img.shields.io/badge/arXiv-2506.03179v1-b31b1b.svg" alt="arXiv Paper"/>
</a>
<a href="https://huggingface.co/LIQIIIII/Vid-SME" target="_blank">
<img src="https://img.shields.io/badge/🤗_HuggingFace-Model-ffbd45.svg" alt="HuggingFace"/>
</a>
<a href="https://huggingface.co/datasets/LIQIIIII/Vid-SME-Eval" target="_blank">
<img
src="https://img.shields.io/badge/%F0%9F%A4%97%20Dataset-Coming%20Soon-ffbd45.svg"
alt="🤗 Dataset — Coming Soon"
/>
</a>
</div>
<div>
Qi Li Runpeng Yu Xinchao Wang<sup>†</sup>
</div>
<div>
<a href="https://sites.google.com/view/xml-nus/people?authuser=0" target="_blank">xML-Lab</a>, National University of Singapore
<sup>†</sup>corresponding author
</div>
</div>
</div>
------------------
TL;DR (1) - Introduce Vid-SME, the first dedicated method for video membership inference attacks against large video understanding models.
TL;DR (2) - Benchmarking MIA performance by training three VULLMs, each on a distinct dataset, using different representative training strategies.
## Overview
<div align="center">
<div style="max-width: 100%; text-align: left; margin-bottom: 20px;">
<img src="assets/main_pipeline.jpg" alt="Diagram 2" style="display: block; margin: 0 auto; width: 100%;">
</div>
</div>
<strong>Figure 1.</strong> Vid-SME against Video Understanding Large Language Models (VULLMs). <strong>Left:</strong> An example of the video instruction context used in our experiments. <strong>Middle:</strong> The overall pipeline of Vid-SME. <strong>Right:</strong> The detailed illustration of the membership score calculaiton of Vid-SME.
## Installation & Preparation
1. Follow the instructions provided in [LongVA](https://github.com/EvolvingLMMs-Lab/LongVA) to build the environment.
2. Download the [models](https://huggingface.co/LIQIIIII/Vid-SME) and move them into `./checkpoints`. For the [datasets](https://huggingface.co/datasets/LIQIIIII/Vid-SME), the json files are given in the `./video_json` folder, download the related videos and move them into `./video_json/videos`.
## Evaluation
Run Vid-SME on each model via the corresponding script:
```
python Vid_SME_main_CinePile.py
```
## Citation
If you finding our work interesting or helpful to you, please cite as follows:
```
@misc
{li2025vidsmemembershipinferenceattacks,
title={Vid-SME: Membership Inference Attacks against Large Video Understanding Models},
author={Qi Li and Runpeng Yu and Xinchao Wang},
year={2025},
eprint={2506.03179},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2506.03179},
}
```
@@ -0,0 +1,13 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
datasets:
|
3 |
+
- lmms-lab/NExTQA
|
4 |
+
- tomg-group-umd/cinepile
|
5 |
+
- sy1998/MLVU
|
6 |
+
- sy1998/MLVU_Test
|
7 |
+
- wchai/lmms_VDC_test
|
8 |
+
- sy1998/Video_XL_Training
|
9 |
+
language:
|
10 |
+
- en
|
11 |
+
base_model:
|
12 |
+
- Qwen/Qwen2-7B-Instruct
|
13 |
+
---
|