Zhengxue
/

llava-onevision-0.5b-ov_train3

Model card Files Files and versions Community

Zhengxue commited on Mar 19

Commit

f035495

·

verified ·

1 Parent(s): a6c60d0

Update README.md

Files changed (1) hide show

README.md +18 -11

README.md CHANGED Viewed

@@ -1,37 +1,44 @@
 ---
 base_model: llava-hf/llava-onevision-qwen2-0.5b-ov-hf
 library_name: peft
 ---
 # Model Card for Model ID
 <!-- Provide a quick summary of what the model is/does. -->
 ## Model Details
 ### Model Description
-<!-- Provide a longer summary of what this model is. -->
-- **Developed by:** [More Information Needed]
-- **Funded by [optional]:** [More Information Needed]
-- **Shared by [optional]:** [More Information Needed]
 - **Model type:** [More Information Needed]
-- **Language(s) (NLP):** [More Information Needed]
-- **License:** [More Information Needed]
-- **Finetuned from model [optional]:** [More Information Needed]
 ### Model Sources [optional]
 <!-- Provide the basic links for the model. -->
-- **Repository:** [More Information Needed]
-- **Paper [optional]:** [More Information Needed]
-- **Demo [optional]:** [More Information Needed]
 ## Uses

 ---
 base_model: llava-hf/llava-onevision-qwen2-0.5b-ov-hf
 library_name: peft
+license: mit
+language:
+- en
+tags:
+- chemistry
 ---
 # Model Card for Model ID
 <!-- Provide a quick summary of what the model is/does. -->
+The model is finetuned with images of robot manipulation in chemistry labs.
 ## Model Details
 ### Model Description
+The model is based on LLaVA-OneVision 0.5B, fine-tuned for visual inspection and reasoning in laboratory automation tasks. The model processes image inputs and generates Boolean inspection results (True/False) with detailed reasoning, enabling error detection and recovery in robotic workflows.
+Fine-tuning was performed using LoRA on both the vision encoder and projector, optimizing efficiency while maintaining accuracy. The model operates on edge devices (tested with NVIDIA AGX Orin), making it suitable for real-time decision-making in resource-constrained environments.
+Trained on a curated dataset of laboratory environments, the VLM is capable of detecting object misalignment and positioning errors. When an error is detected, it provides natural language reasoning about the issue, supporting automated corrective actions in robotics workflows.
+This model is particularly useful for scientific automation, self-driving labs (SDLs), and robotic inspection systems, enhancing workflow robustness and efficiency in real-world experimental setups.
+- **Developed by:** Zhengxue Zhou
+- **Shared by:** Zhengxue Zhou
 - **Model type:** [More Information Needed]
+- **Language(s) (NLP):** English
+- **License:** MIT
+- **Finetuned from model [optional]:** llava-hf/llava-onevision-qwen2-0.5b-ov-hf
 ### Model Sources [optional]
 <!-- Provide the basic links for the model. -->
+- **Repository:** https://github.com/cooper-group-uol-robotics/LIRA.git
+- **Paper :** LIRA: Localization, Inspection, and Reasoning Module for Autonomous Workflows in Self-Driving Labs (Under Review)
+- **Demo :** Following the introduction in the repository
 ## Uses