Add pipeline tag, library name and link to paper
#1
by
nielsr
HF Staff
- opened
README.md
CHANGED
@@ -1,15 +1,19 @@
|
|
1 |
---
|
2 |
-
license: apache-2.0
|
3 |
-
datasets:
|
4 |
-
- lmms-lab/EgoLife
|
5 |
base_model:
|
6 |
- lmms-lab/llava-onevision-qwen2-0.5b-ov
|
|
|
|
|
|
|
|
|
|
|
7 |
tags:
|
8 |
- multimodal
|
9 |
---
|
10 |
|
11 |
# EgoGPT-0.5b-Demo
|
12 |
|
|
|
|
|
13 |
## Model Summary
|
14 |
|
15 |
`EgoGPT-0.5b-Demo` is an omni-modal model trained on egocentric datasets, achieving state-of-the-art performance on egocentric video understanding. Built on the foundation of `llava-onevision-qwen2-0.5b-ov`, it has been finetuned on `EgoIT-EgoLife-138k` egocentric datasets, which contains [EgoIT-99k](https://huggingface.co/datasets/lmms-lab/EgoIT-99K) and depersonalized version of [EgoLife-QA (39k)](https://huggingface.co/datasets/lmms-lab/EgoLife).
|
@@ -141,7 +145,10 @@ def main(
|
|
141 |
model.eval()
|
142 |
|
143 |
conv_template = "qwen_1_5"
|
144 |
-
question = f"<image
|
|
|
|
|
|
|
145 |
conv = copy.deepcopy(conv_templates[conv_template])
|
146 |
conv.append_message(conv.roles[0], question)
|
147 |
conv.append_message(conv.roles[1], None)
|
@@ -205,7 +212,7 @@ if __name__ == "__main__":
|
|
205 |
```bibtex
|
206 |
@inproceedings{yang2025egolife,
|
207 |
title={EgoLife: Towards Egocentric Life Assistant},
|
208 |
-
author={Yang, Jingkang and Liu, Shuai and Guo, Hongming and Dong, Yuhao and Zhang, Xiamengwei and Zhang, Sicheng and Wang, Pengyun and Zhou, Zitang and Xie
|
209 |
booktitle={The IEEE/CVF Conference on Computer Vision and Pattern Recognition},
|
210 |
year={2025},
|
211 |
}
|
|
|
1 |
---
|
|
|
|
|
|
|
2 |
base_model:
|
3 |
- lmms-lab/llava-onevision-qwen2-0.5b-ov
|
4 |
+
datasets:
|
5 |
+
- lmms-lab/EgoLife
|
6 |
+
license: apache-2.0
|
7 |
+
pipeline_tag: video-text-to-text
|
8 |
+
library_name: transformers
|
9 |
tags:
|
10 |
- multimodal
|
11 |
---
|
12 |
|
13 |
# EgoGPT-0.5b-Demo
|
14 |
|
15 |
+
Release model for paper [EgoLife: Towards Egocentric Life Assistant](https://arxiv.org/abs/2503.03803).
|
16 |
+
|
17 |
## Model Summary
|
18 |
|
19 |
`EgoGPT-0.5b-Demo` is an omni-modal model trained on egocentric datasets, achieving state-of-the-art performance on egocentric video understanding. Built on the foundation of `llava-onevision-qwen2-0.5b-ov`, it has been finetuned on `EgoIT-EgoLife-138k` egocentric datasets, which contains [EgoIT-99k](https://huggingface.co/datasets/lmms-lab/EgoIT-99K) and depersonalized version of [EgoLife-QA (39k)](https://huggingface.co/datasets/lmms-lab/EgoLife).
|
|
|
145 |
model.eval()
|
146 |
|
147 |
conv_template = "qwen_1_5"
|
148 |
+
question = f"<image>
|
149 |
+
<speech>
|
150 |
+
|
151 |
+
{query}"
|
152 |
conv = copy.deepcopy(conv_templates[conv_template])
|
153 |
conv.append_message(conv.roles[0], question)
|
154 |
conv.append_message(conv.roles[1], None)
|
|
|
212 |
```bibtex
|
213 |
@inproceedings{yang2025egolife,
|
214 |
title={EgoLife: Towards Egocentric Life Assistant},
|
215 |
+
author={Yang, Jingkang and Liu, Shuai and Guo, Hongming and Dong, Yuhao and Zhang, Xiamengwei and Zhang, Sicheng and Wang, Pengyun and Zhou, Zitang Zhou and Binzhu Xie and Ziyue Wang and Bei Ouyang and Zhengyu Lin and Cominelli, Marco and Cai, Zhongang and Zhang, Yuanhan and Zhang, Peiyuan and Hong, Fangzhou and Widmer, Joerg and Gringoli, Francesco and Yang, Lei and Li, Bo and Liu, Ziwei},
|
216 |
booktitle={The IEEE/CVF Conference on Computer Vision and Pattern Recognition},
|
217 |
year={2025},
|
218 |
}
|