nkkbr
/

ViCA

Video-Text-to-Text

text-generation

vision-language

video understanding

spatial reasoning

visuospatial cognition

Model card Files Files and versions

nkkbr commited on May 28

Commit

8185a7d

·

verified ·

1 Parent(s): 88a067a

Update README.md

Files changed (1) hide show

README.md +19 -0

README.md CHANGED Viewed

@@ -293,6 +293,25 @@ ViCA-7B supports a broad range of spatially grounded multimodal applications:
 - No depth/point cloud: Only RGB video input supported
 - Zero-shot generalization is good, but not task-agnostic
 ## Inference
 *Here is a runnable example using ViCA-7B on a VSI-Bench question.*

 - No depth/point cloud: Only RGB video input supported
 - Zero-shot generalization is good, but not task-agnostic
+##  Download
+You can download the model weights to your local environment (optional).
+```python
+from huggingface_hub import snapshot_download
+save_dir = "./ViCA"
+repo_id = "nkkbr/ViCA"
+cache_dir = save_dir + "/cache"
+snapshot_download(cache_dir=cache_dir,
+  local_dir=save_dir,
+  repo_id=repo_id,
+  local_dir_use_symlinks=False,
+  resume_download=True,
+)
+```
 ## Inference
 *Here is a runnable example using ViCA-7B on a VSI-Bench question.*