Spaces:

mrdbourke
/

qwen2.5-vl-food-detect

Running on Zero

App Files Files Community

mrdbourke commited on May 11

Commit

010417c

verified ·

1 Parent(s): 7f4399b

Update app.py

Browse files

Files changed (1) hide show

app.py +9 -2

app.py CHANGED Viewed

@@ -684,7 +684,7 @@ def infer_on_image(input_image):
 description = f"""Demo based on example [Qwen2.5-VL spatial notebook](https://github.com/QwenLM/Qwen2.5-VL/blob/main/cookbooks/spatial_understanding.ipynb) for detecting foods and drinks in images with bounding boxes. Input an image of food/drink for bounding boxes to be detected. If no food is present in an image the model should return 'no foods found'.\n
 One prediction will use thinking tags, e.g. <think>...</think> to try an describe what's in the image. The other will directly predict a JSON of bounding box coordinates and labels.
-Boxes may not be as accurate as a dedicated object detection model but the benefit here is that they are class agnostic.
 The foundation knowledge in Qwen2.5-VL (we are using [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) in this demo) means it can detect a wide range of foods and drinks.
 See the app.py file for the different prompts used."""
@@ -697,6 +697,13 @@ demo = gr.Interface(fn=infer_on_image,
                              gr.Text(label="Raw output w/o thinking tags"),
                              gr.Text(label="Inference time w/o thinking tags")],
                     title="Qwen2.5-VL Food Detection 👁️🍔",
-                    description=description)
 demo.launch(debug=True)

 description = f"""Demo based on example [Qwen2.5-VL spatial notebook](https://github.com/QwenLM/Qwen2.5-VL/blob/main/cookbooks/spatial_understanding.ipynb) for detecting foods and drinks in images with bounding boxes. Input an image of food/drink for bounding boxes to be detected. If no food is present in an image the model should return 'no foods found'.\n
 One prediction will use thinking tags, e.g. <think>...</think> to try an describe what's in the image. The other will directly predict a JSON of bounding box coordinates and labels.
+Boxes may not be as accurate as a dedicated object detection model but the benefit here is that they are class agnostic (e.g. the model can detect a wide range of items despite never being explicitly trained on them).
 The foundation knowledge in Qwen2.5-VL (we are using [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) in this demo) means it can detect a wide range of foods and drinks.
 See the app.py file for the different prompts used."""
                              gr.Text(label="Raw output w/o thinking tags"),
                              gr.Text(label="Inference time w/o thinking tags")],
                     title="Qwen2.5-VL Food Detection 👁️🍔",
+                    description=description,
+                    # Examples come in the form of a list of lists, where each inner list contains elements to prefill the `inputs` parameter with
+                    examples=[
+                        ["examples/example_1.jpeg"],
+                        ["examples/example_2.jpeg"],
+                        ["examples/example_3.jpeg"]
+                    ],
+                    cache_examples=True)
 demo.launch(debug=True)