Update README.md
Browse files
README.md
CHANGED
@@ -17,9 +17,9 @@ Based on Qwen2.5 language model, it is trained on text, image, video and audio d
|
|
17 |
|
18 |
Ola offers an on-demand solution to seamlessly and efficiently process visual inputs with arbitrary spatial sizes and temporal lengths.
|
19 |
|
20 |
-
- **Repository:** https://github.com/
|
21 |
- **Languages:** English, Chinese
|
22 |
-
- **Paper:** https://arxiv.org/abs/
|
23 |
|
24 |
## Use
|
25 |
|
@@ -314,3 +314,9 @@ def ola_inference(multimodal, audio_path):
|
|
314 |
- **Code:** Pytorch
|
315 |
|
316 |
## Citation
|
|
|
|
|
|
|
|
|
|
|
|
|
|
17 |
|
18 |
Ola offers an on-demand solution to seamlessly and efficiently process visual inputs with arbitrary spatial sizes and temporal lengths.
|
19 |
|
20 |
+
- **Repository:** https://github.com/Ola-Omni/Ola
|
21 |
- **Languages:** English, Chinese
|
22 |
+
- **Paper:** https://arxiv.org/abs/2502.04328
|
23 |
|
24 |
## Use
|
25 |
|
|
|
314 |
- **Code:** Pytorch
|
315 |
|
316 |
## Citation
|
317 |
+
@article{liu2025ola,
|
318 |
+
title={Ola: Pushing the Frontiers of Omni-Modal Language Model with Progressive Modality Alignment},
|
319 |
+
author={Liu, Zuyan and Dong, Yuhao and Wang, Jiahui and Liu, Ziwei and Hu, Winston and Lu, Jiwen and Rao, Yongming},
|
320 |
+
journal={arXiv preprint arXiv:2502.04328},
|
321 |
+
year={2025}
|
322 |
+
}
|