vaibhavmeena
/

Phi-3.5-vision-instruct-amz-lora

Question Answering

Model card Files Files and versions

vaibhav meena commited on Sep 16, 2024

Commit

1c3ecc3

·

verified ·

1 Parent(s): 8f58054

Update README.md

Files changed (1) hide show

README.md +50 -1

README.md CHANGED Viewed

@@ -7,4 +7,53 @@ language:
 base_model:
 - microsoft/Phi-3.5-vision-instruct
 pipeline_tag: question-answering
----

 base_model:
 - microsoft/Phi-3.5-vision-instruct
 pipeline_tag: question-answering
+---
+# Model Card for Fine-tuned Phi-3.5-Vision-Instruct
+This model is fine-tuned from Microsoft's Phi-3.5-Vision-Instruct to improve visual question answering tasks, particularly for detailed item measurements in images. It has been trained on specific datasets that provide real-world images with tasks related to recognizing and accurately reporting measurements, avoiding assumptions or fabrications, and converting units into standard forms.
+## Model Details
+### Model Description
+- **Developed by:** [More Information Needed]
+- **Model type:** Vision-based question answering model
+- **Language(s):** English
+- **License:** MIT
+- **Base model:** [microsoft/Phi-3.5-vision-instruct](https://huggingface.co/microsoft/Phi-3.5-vision-instruct)
+### Model Sources
+- **Repository:** [More Information Needed]
+## Uses
+### Direct Use
+This model can be directly used for tasks where visual analysis and measurement extraction are critical. For example:
+- Extracting measurements from product images.
+- Answering detailed questions based on the visual content of an image.
+### Out-of-Scope Use
+- The model should not be used for tasks requiring deep semantic understanding of objects beyond what is visually apparent or tasks that involve creative interpretation.
+## Bias, Risks, and Limitations
+The model is limited to its training data and might not generalize well to images that differ significantly from the dataset. Risks include:
+- Misinterpretation of visual data if the image is unclear.
+- Inability to handle text-heavy images that do not align with the training data.
+### Recommendations
+Users should verify the output in cases where exact measurements are crucial. Misuse in scenarios where a high degree of visual interpretation is required may lead to inaccurate responses.
+### Training Data
+The model was fine-tuned on a dataset that included images with visual elements such as product measurements. The data was formatted with instructions to avoid assumptions and to only rely on visual information.
+### Training Procedure
+- **Training regime:** fp16 mixed precision.