lmms-lab
/

EgoGPT-0.5b-Demo

Safetensors

egogpt_qwen

multimodal

Model card Files Files and versions Community

Add pipeline tag, library name and link to paper

by nielsr HF Staff - opened Mar 8

base: refs/heads/main

←

from: refs/pr/1

Discussion Files changed

+12

-5

Files changed (1) hide show

README.md +12 -5

README.md CHANGED Viewed

@@ -1,15 +1,19 @@
 ---
-license: apache-2.0
-datasets:
-- lmms-lab/EgoLife
 base_model:
 - lmms-lab/llava-onevision-qwen2-0.5b-ov
 tags:
 - multimodal
 ---
 # EgoGPT-0.5b-Demo
 ## Model Summary
 `EgoGPT-0.5b-Demo` is an omni-modal model trained on egocentric datasets, achieving state-of-the-art performance on egocentric video understanding. Built on the foundation of `llava-onevision-qwen2-0.5b-ov`, it has been finetuned on `EgoIT-EgoLife-138k` egocentric datasets, which contains [EgoIT-99k](https://huggingface.co/datasets/lmms-lab/EgoIT-99K) and depersonalized version of [EgoLife-QA (39k)](https://huggingface.co/datasets/lmms-lab/EgoLife).
@@ -141,7 +145,10 @@ def main(
     model.eval()
     conv_template = "qwen_1_5"
-    question = f"<image>\n<speech>\n\n{query}"
     conv = copy.deepcopy(conv_templates[conv_template])
     conv.append_message(conv.roles[0], question)
     conv.append_message(conv.roles[1], None)
@@ -205,7 +212,7 @@ if __name__ == "__main__":
 ```bibtex
 @inproceedings{yang2025egolife,
   title={EgoLife: Towards Egocentric Life Assistant},
-  author={Yang, Jingkang and Liu, Shuai and Guo, Hongming and Dong, Yuhao and Zhang, Xiamengwei and Zhang, Sicheng and Wang, Pengyun and Zhou, Zitang and Xie, Binzhu and Wang, Ziyue and Ouyang, Bei and Lin, Zhengyu and Cominelli, Marco and Cai, Zhongang and Zhang, Yuanhan and Zhang, Peiyuan and Hong, Fangzhou and Widmer, Joerg and Gringoli, Francesco and Yang, Lei and Li, Bo and Liu, Ziwei},
   booktitle={The IEEE/CVF Conference on Computer Vision and Pattern Recognition},
   year={2025},
 }

 ---
 base_model:
 - lmms-lab/llava-onevision-qwen2-0.5b-ov
+datasets:
+- lmms-lab/EgoLife
+license: apache-2.0
+pipeline_tag: video-text-to-text
+library_name: transformers
 tags:
 - multimodal
 ---
 # EgoGPT-0.5b-Demo
+Release model for paper [EgoLife: Towards Egocentric Life Assistant](https://arxiv.org/abs/2503.03803).
 ## Model Summary
 `EgoGPT-0.5b-Demo` is an omni-modal model trained on egocentric datasets, achieving state-of-the-art performance on egocentric video understanding. Built on the foundation of `llava-onevision-qwen2-0.5b-ov`, it has been finetuned on `EgoIT-EgoLife-138k` egocentric datasets, which contains [EgoIT-99k](https://huggingface.co/datasets/lmms-lab/EgoIT-99K) and depersonalized version of [EgoLife-QA (39k)](https://huggingface.co/datasets/lmms-lab/EgoLife).
     model.eval()
     conv_template = "qwen_1_5"
+    question = f"<image>
+<speech>
+{query}"
     conv = copy.deepcopy(conv_templates[conv_template])
     conv.append_message(conv.roles[0], question)
     conv.append_message(conv.roles[1], None)
 ```bibtex
 @inproceedings{yang2025egolife,
   title={EgoLife: Towards Egocentric Life Assistant},
+  author={Yang, Jingkang and Liu, Shuai and Guo, Hongming and Dong, Yuhao and Zhang, Xiamengwei and Zhang, Sicheng and Wang, Pengyun and Zhou, Zitang Zhou and Binzhu Xie and Ziyue Wang and Bei Ouyang and Zhengyu Lin and Cominelli, Marco and Cai, Zhongang and Zhang, Yuanhan and Zhang, Peiyuan and Hong, Fangzhou and Widmer, Joerg and Gringoli, Francesco and Yang, Lei and Li, Bo and Liu, Ziwei},
   booktitle={The IEEE/CVF Conference on Computer Vision and Pattern Recognition},
   year={2025},
 }