Update README.md
Browse files
README.md
CHANGED
@@ -63,16 +63,12 @@ model-index:
|
|
63 |
|
64 |
# ViCA-7B: Visuospatial Cognitive Assistant
|
65 |
|
66 |
-
[](https://arxiv.org/abs/2505.12312)
|
67 |
-
|
68 |
> You may also be interested in our other project, **ViCA2**. Please refer to the following links:
|
69 |
|
70 |
[](https://github.com/nkkbr/ViCA)
|
71 |
|
72 |
[](https://huggingface.co/nkkbr/ViCA2)
|
73 |
|
74 |
-
[](https://arxiv.org/abs/2505.12363)
|
75 |
-
|
76 |
## Overview
|
77 |
|
78 |
**ViCA-7B** is a vision-language model specifically fine-tuned for *visuospatial reasoning* in indoor video environments. Built upon the LLaVA-Video-7B-Qwen2 architecture, it is trained using our newly proposed **ViCA-322K dataset**, which emphasizes both structured spatial annotations and complex instruction-based reasoning tasks.
|
|
|
63 |
|
64 |
# ViCA-7B: Visuospatial Cognitive Assistant
|
65 |
|
|
|
|
|
66 |
> You may also be interested in our other project, **ViCA2**. Please refer to the following links:
|
67 |
|
68 |
[](https://github.com/nkkbr/ViCA)
|
69 |
|
70 |
[](https://huggingface.co/nkkbr/ViCA2)
|
71 |
|
|
|
|
|
72 |
## Overview
|
73 |
|
74 |
**ViCA-7B** is a vision-language model specifically fine-tuned for *visuospatial reasoning* in indoor video environments. Built upon the LLaVA-Video-7B-Qwen2 architecture, it is trained using our newly proposed **ViCA-322K dataset**, which emphasizes both structured spatial annotations and complex instruction-based reasoning tasks.
|