Spaces:
Running
on
Zero
Running
on
Zero
Update app.py
Browse files
app.py
CHANGED
@@ -684,7 +684,7 @@ def infer_on_image(input_image):
|
|
684 |
|
685 |
description = f"""Demo based on example [Qwen2.5-VL spatial notebook](https://github.com/QwenLM/Qwen2.5-VL/blob/main/cookbooks/spatial_understanding.ipynb) for detecting foods and drinks in images with bounding boxes. Input an image of food/drink for bounding boxes to be detected. If no food is present in an image the model should return 'no foods found'.\n
|
686 |
One prediction will use thinking tags, e.g. <think>...</think> to try an describe what's in the image. The other will directly predict a JSON of bounding box coordinates and labels.
|
687 |
-
Boxes may not be as accurate as a dedicated object detection model but the benefit here is that they are class agnostic.
|
688 |
The foundation knowledge in Qwen2.5-VL (we are using [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) in this demo) means it can detect a wide range of foods and drinks.
|
689 |
See the app.py file for the different prompts used."""
|
690 |
|
@@ -697,6 +697,13 @@ demo = gr.Interface(fn=infer_on_image,
|
|
697 |
gr.Text(label="Raw output w/o thinking tags"),
|
698 |
gr.Text(label="Inference time w/o thinking tags")],
|
699 |
title="Qwen2.5-VL Food Detection ποΈπ",
|
700 |
-
description=description
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
701 |
|
702 |
demo.launch(debug=True)
|
|
|
684 |
|
685 |
description = f"""Demo based on example [Qwen2.5-VL spatial notebook](https://github.com/QwenLM/Qwen2.5-VL/blob/main/cookbooks/spatial_understanding.ipynb) for detecting foods and drinks in images with bounding boxes. Input an image of food/drink for bounding boxes to be detected. If no food is present in an image the model should return 'no foods found'.\n
|
686 |
One prediction will use thinking tags, e.g. <think>...</think> to try an describe what's in the image. The other will directly predict a JSON of bounding box coordinates and labels.
|
687 |
+
Boxes may not be as accurate as a dedicated object detection model but the benefit here is that they are class agnostic (e.g. the model can detect a wide range of items despite never being explicitly trained on them).
|
688 |
The foundation knowledge in Qwen2.5-VL (we are using [Qwen2.5-VL-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct) in this demo) means it can detect a wide range of foods and drinks.
|
689 |
See the app.py file for the different prompts used."""
|
690 |
|
|
|
697 |
gr.Text(label="Raw output w/o thinking tags"),
|
698 |
gr.Text(label="Inference time w/o thinking tags")],
|
699 |
title="Qwen2.5-VL Food Detection ποΈπ",
|
700 |
+
description=description,
|
701 |
+
# Examples come in the form of a list of lists, where each inner list contains elements to prefill the `inputs` parameter with
|
702 |
+
examples=[
|
703 |
+
["examples/example_1.jpeg"],
|
704 |
+
["examples/example_2.jpeg"],
|
705 |
+
["examples/example_3.jpeg"]
|
706 |
+
],
|
707 |
+
cache_examples=True)
|
708 |
|
709 |
demo.launch(debug=True)
|