Spaces:

AhmedHAnwar
/

Gradio_image_code

Runtime error

App Files Files Community

Gradio_image_code / README.md

AhmedHAnwar's picture

Update README.md

be4f957 verified about 2 months ago

|

history blame contribute delete

3.32 kB

A newer version of the Gradio SDK is available: 5.38.2

Upgrade

metadata

title: Gradio Image Code
emoji: 🌖
colorFrom: pink
colorTo: yellow
sdk: gradio
sdk_version: 5.32.1
app_file: app.py
pinned: false

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

🧠 Qwen + DeepSeek Gradio App

A Gradio web app that demonstrates:

Image Captioning using Qwen-VL-Chat-Int4
Code Generation using DeepSeek-R1-Distill-Qwen-1.5B

This app is tested and runs efficiently on Kaggle notebooks with T4 x2 GPU accelerators.

⚠️ Note: Colab is not recommended for this project because downloading the Qwen-VL-Chat-Int4 model takes a long time and often fails. Kaggle is faster and more stable.

🚀 Features

🖼️ Vision-Language tab: Upload an image + custom prompt → generate short description
💻 Code Generator tab: Write a prompt → get streaming code output
Adjustable decoding parameters: temperature, top-p, max_new_tokens

🧩 Installation

pip install transformers
pip install gradio
pip install transformers_stream_generator optimum auto-gptq

Ensure your runtime supports GPU (e.g., Colab or local CUDA environment).

📦 Model Details

1. Qwen-VL-Chat-Int4 (Image-to-Text)

Used for concise image descriptions.
Streaming output with TextIteratorStreamer.
Prompt format:

<|system|>
You are a helpful assistant that describes images very concisely...
<|end|>
<|user|>
Describe the image...
<|end|>
<|assistant|>

🔧 Prompt Engineering Insight

Without <|assistant|> tag, the model sometimes overwrites or fails to complete properly.
Adding <|assistant|> clearly indicates the model’s turn, reducing hallucinations.
Temperature capped to ~1.0 because higher values (e.g., 1.2+) lead to creative but false outputs.

2. DeepSeek-R1-Distill-Qwen-1.5B (Text-to-Code)

Generates Python or other code from natural language prompts.
Uses chat-based prompting with:
- <think>...</think> block for reasoning.
- Final answer separated to improve clarity.

🔧 Prompt Engineering Insight

Initially used no system prompt → vague reasoning.
Adding a system prompt improved guidance.
Separating "thinking" and "final answer" boosted relevance.
Future improvement: split thinking and answer into separate UI tabs.

🖼️ Usage: Image Description Tab

Upload an image.
Write a natural prompt (e.g., "What is in this picture?")
Adjust:
- Temperature: Higher = more creativity, but limit for stability.
- Top-p: Controls sampling diversity.
- Max new tokens: Max length of generated sentence.
Click Generate → streaming description appears.

💻 Usage: Code Generation Tab

Write a programming task (e.g., "Write Python code to reverse a string.")
Adjust generation settings as above.
Streaming output displays generated code.
Stops early if vague prompt → clarify prompt to improve results.

🧠 Future Work

Add a separate tab for model “thinking” (<think>...</think>) versus final code.
Optional logging for input-output pairs to track hallucinations or failures.
Add Markdown rendering for image descriptions.