Safetensors
egogpt_qwen
multimodal

Add pipeline tag, library name and link to paper

#1
by nielsr HF Staff - opened
Files changed (1) hide show
  1. README.md +12 -5
README.md CHANGED
@@ -1,15 +1,19 @@
1
  ---
2
- license: apache-2.0
3
- datasets:
4
- - lmms-lab/EgoLife
5
  base_model:
6
  - lmms-lab/llava-onevision-qwen2-0.5b-ov
 
 
 
 
 
7
  tags:
8
  - multimodal
9
  ---
10
 
11
  # EgoGPT-0.5b-Demo
12
 
 
 
13
  ## Model Summary
14
 
15
  `EgoGPT-0.5b-Demo` is an omni-modal model trained on egocentric datasets, achieving state-of-the-art performance on egocentric video understanding. Built on the foundation of `llava-onevision-qwen2-0.5b-ov`, it has been finetuned on `EgoIT-EgoLife-138k` egocentric datasets, which contains [EgoIT-99k](https://huggingface.co/datasets/lmms-lab/EgoIT-99K) and depersonalized version of [EgoLife-QA (39k)](https://huggingface.co/datasets/lmms-lab/EgoLife).
@@ -141,7 +145,10 @@ def main(
141
  model.eval()
142
 
143
  conv_template = "qwen_1_5"
144
- question = f"<image>\n<speech>\n\n{query}"
 
 
 
145
  conv = copy.deepcopy(conv_templates[conv_template])
146
  conv.append_message(conv.roles[0], question)
147
  conv.append_message(conv.roles[1], None)
@@ -205,7 +212,7 @@ if __name__ == "__main__":
205
  ```bibtex
206
  @inproceedings{yang2025egolife,
207
  title={EgoLife: Towards Egocentric Life Assistant},
208
- author={Yang, Jingkang and Liu, Shuai and Guo, Hongming and Dong, Yuhao and Zhang, Xiamengwei and Zhang, Sicheng and Wang, Pengyun and Zhou, Zitang and Xie, Binzhu and Wang, Ziyue and Ouyang, Bei and Lin, Zhengyu and Cominelli, Marco and Cai, Zhongang and Zhang, Yuanhan and Zhang, Peiyuan and Hong, Fangzhou and Widmer, Joerg and Gringoli, Francesco and Yang, Lei and Li, Bo and Liu, Ziwei},
209
  booktitle={The IEEE/CVF Conference on Computer Vision and Pattern Recognition},
210
  year={2025},
211
  }
 
1
  ---
 
 
 
2
  base_model:
3
  - lmms-lab/llava-onevision-qwen2-0.5b-ov
4
+ datasets:
5
+ - lmms-lab/EgoLife
6
+ license: apache-2.0
7
+ pipeline_tag: video-text-to-text
8
+ library_name: transformers
9
  tags:
10
  - multimodal
11
  ---
12
 
13
  # EgoGPT-0.5b-Demo
14
 
15
+ Release model for paper [EgoLife: Towards Egocentric Life Assistant](https://arxiv.org/abs/2503.03803).
16
+
17
  ## Model Summary
18
 
19
  `EgoGPT-0.5b-Demo` is an omni-modal model trained on egocentric datasets, achieving state-of-the-art performance on egocentric video understanding. Built on the foundation of `llava-onevision-qwen2-0.5b-ov`, it has been finetuned on `EgoIT-EgoLife-138k` egocentric datasets, which contains [EgoIT-99k](https://huggingface.co/datasets/lmms-lab/EgoIT-99K) and depersonalized version of [EgoLife-QA (39k)](https://huggingface.co/datasets/lmms-lab/EgoLife).
 
145
  model.eval()
146
 
147
  conv_template = "qwen_1_5"
148
+ question = f"<image>
149
+ <speech>
150
+
151
+ {query}"
152
  conv = copy.deepcopy(conv_templates[conv_template])
153
  conv.append_message(conv.roles[0], question)
154
  conv.append_message(conv.roles[1], None)
 
212
  ```bibtex
213
  @inproceedings{yang2025egolife,
214
  title={EgoLife: Towards Egocentric Life Assistant},
215
+ author={Yang, Jingkang and Liu, Shuai and Guo, Hongming and Dong, Yuhao and Zhang, Xiamengwei and Zhang, Sicheng and Wang, Pengyun and Zhou, Zitang Zhou and Binzhu Xie and Ziyue Wang and Bei Ouyang and Zhengyu Lin and Cominelli, Marco and Cai, Zhongang and Zhang, Yuanhan and Zhang, Peiyuan and Hong, Fangzhou and Widmer, Joerg and Gringoli, Francesco and Yang, Lei and Li, Bo and Liu, Ziwei},
216
  booktitle={The IEEE/CVF Conference on Computer Vision and Pattern Recognition},
217
  year={2025},
218
  }