vaibhav meena commited on
Commit
1c3ecc3
verified
1 Parent(s): 8f58054

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +50 -1
README.md CHANGED
@@ -7,4 +7,53 @@ language:
7
  base_model:
8
  - microsoft/Phi-3.5-vision-instruct
9
  pipeline_tag: question-answering
10
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  base_model:
8
  - microsoft/Phi-3.5-vision-instruct
9
  pipeline_tag: question-answering
10
+ ---
11
+ # Model Card for Fine-tuned Phi-3.5-Vision-Instruct
12
+
13
+ This model is fine-tuned from Microsoft's Phi-3.5-Vision-Instruct to improve visual question answering tasks, particularly for detailed item measurements in images. It has been trained on specific datasets that provide real-world images with tasks related to recognizing and accurately reporting measurements, avoiding assumptions or fabrications, and converting units into standard forms.
14
+
15
+ ## Model Details
16
+
17
+ ### Model Description
18
+
19
+ - **Developed by:** [More Information Needed]
20
+ - **Model type:** Vision-based question answering model
21
+ - **Language(s):** English
22
+ - **License:** MIT
23
+ - **Base model:** [microsoft/Phi-3.5-vision-instruct](https://huggingface.co/microsoft/Phi-3.5-vision-instruct)
24
+
25
+ ### Model Sources
26
+
27
+ - **Repository:** [More Information Needed]
28
+
29
+ ## Uses
30
+
31
+ ### Direct Use
32
+
33
+ This model can be directly used for tasks where visual analysis and measurement extraction are critical. For example:
34
+ - Extracting measurements from product images.
35
+ - Answering detailed questions based on the visual content of an image.
36
+
37
+ ### Out-of-Scope Use
38
+
39
+ - The model should not be used for tasks requiring deep semantic understanding of objects beyond what is visually apparent or tasks that involve creative interpretation.
40
+
41
+ ## Bias, Risks, and Limitations
42
+
43
+ The model is limited to its training data and might not generalize well to images that differ significantly from the dataset. Risks include:
44
+ - Misinterpretation of visual data if the image is unclear.
45
+ - Inability to handle text-heavy images that do not align with the training data.
46
+
47
+ ### Recommendations
48
+
49
+ Users should verify the output in cases where exact measurements are crucial. Misuse in scenarios where a high degree of visual interpretation is required may lead to inaccurate responses.
50
+
51
+
52
+ ### Training Data
53
+
54
+ The model was fine-tuned on a dataset that included images with visual elements such as product measurements. The data was formatted with instructions to avoid assumptions and to only rely on visual information.
55
+
56
+ ### Training Procedure
57
+
58
+ - **Training regime:** fp16 mixed precision.
59
+