vaibhav meena
		
	commited on
		
		
					Update README.md
Browse files
    	
        README.md
    CHANGED
    
    | @@ -7,4 +7,53 @@ language: | |
| 7 | 
             
            base_model:
         | 
| 8 | 
             
            - microsoft/Phi-3.5-vision-instruct
         | 
| 9 | 
             
            pipeline_tag: question-answering
         | 
| 10 | 
            -
            ---
         | 
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | |
|  | 
|  | |
| 7 | 
             
            base_model:
         | 
| 8 | 
             
            - microsoft/Phi-3.5-vision-instruct
         | 
| 9 | 
             
            pipeline_tag: question-answering
         | 
| 10 | 
            +
            ---
         | 
| 11 | 
            +
            # Model Card for Fine-tuned Phi-3.5-Vision-Instruct
         | 
| 12 | 
            +
             | 
| 13 | 
            +
            This model is fine-tuned from Microsoft's Phi-3.5-Vision-Instruct to improve visual question answering tasks, particularly for detailed item measurements in images. It has been trained on specific datasets that provide real-world images with tasks related to recognizing and accurately reporting measurements, avoiding assumptions or fabrications, and converting units into standard forms.
         | 
| 14 | 
            +
             | 
| 15 | 
            +
            ## Model Details
         | 
| 16 | 
            +
             | 
| 17 | 
            +
            ### Model Description
         | 
| 18 | 
            +
             | 
| 19 | 
            +
            - **Developed by:** [More Information Needed]
         | 
| 20 | 
            +
            - **Model type:** Vision-based question answering model
         | 
| 21 | 
            +
            - **Language(s):** English
         | 
| 22 | 
            +
            - **License:** MIT
         | 
| 23 | 
            +
            - **Base model:** [microsoft/Phi-3.5-vision-instruct](https://huggingface.co/microsoft/Phi-3.5-vision-instruct)
         | 
| 24 | 
            +
             | 
| 25 | 
            +
            ### Model Sources
         | 
| 26 | 
            +
             | 
| 27 | 
            +
            - **Repository:** [More Information Needed]
         | 
| 28 | 
            +
             | 
| 29 | 
            +
            ## Uses
         | 
| 30 | 
            +
             | 
| 31 | 
            +
            ### Direct Use
         | 
| 32 | 
            +
             | 
| 33 | 
            +
            This model can be directly used for tasks where visual analysis and measurement extraction are critical. For example:
         | 
| 34 | 
            +
            - Extracting measurements from product images.
         | 
| 35 | 
            +
            - Answering detailed questions based on the visual content of an image.
         | 
| 36 | 
            +
             | 
| 37 | 
            +
            ### Out-of-Scope Use
         | 
| 38 | 
            +
             | 
| 39 | 
            +
            - The model should not be used for tasks requiring deep semantic understanding of objects beyond what is visually apparent or tasks that involve creative interpretation.
         | 
| 40 | 
            +
             | 
| 41 | 
            +
            ## Bias, Risks, and Limitations
         | 
| 42 | 
            +
             | 
| 43 | 
            +
            The model is limited to its training data and might not generalize well to images that differ significantly from the dataset. Risks include:
         | 
| 44 | 
            +
            - Misinterpretation of visual data if the image is unclear.
         | 
| 45 | 
            +
            - Inability to handle text-heavy images that do not align with the training data.
         | 
| 46 | 
            +
             | 
| 47 | 
            +
            ### Recommendations
         | 
| 48 | 
            +
             | 
| 49 | 
            +
            Users should verify the output in cases where exact measurements are crucial. Misuse in scenarios where a high degree of visual interpretation is required may lead to inaccurate responses.
         | 
| 50 | 
            +
             | 
| 51 | 
            +
             | 
| 52 | 
            +
            ### Training Data
         | 
| 53 | 
            +
             | 
| 54 | 
            +
            The model was fine-tuned on a dataset that included images with visual elements such as product measurements. The data was formatted with instructions to avoid assumptions and to only rely on visual information.
         | 
| 55 | 
            +
             | 
| 56 | 
            +
            ### Training Procedure
         | 
| 57 | 
            +
             | 
| 58 | 
            +
            - **Training regime:** fp16 mixed precision.
         | 
| 59 | 
            +
             |