Spaces:
Runtime error
Runtime error
A newer version of the Gradio SDK is available:
5.38.2
metadata
title: Gradio Image Code
emoji: π
colorFrom: pink
colorTo: yellow
sdk: gradio
sdk_version: 5.32.1
app_file: app.py
pinned: false
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
π§ Qwen + DeepSeek Gradio App
A Gradio web app that demonstrates:
- Image Captioning using Qwen-VL-Chat-Int4
- Code Generation using DeepSeek-R1-Distill-Qwen-1.5B
This app is tested and runs efficiently on Kaggle notebooks with T4 x2 GPU accelerators.
β οΈ Note: Colab is not recommended for this project because downloading the
Qwen-VL-Chat-Int4
model takes a long time and often fails. Kaggle is faster and more stable.
π Features
- πΌοΈ Vision-Language tab: Upload an image + custom prompt β generate short description
- π» Code Generator tab: Write a prompt β get streaming code output
- Adjustable decoding parameters: temperature, top-p, max_new_tokens
π§© Installation
pip install transformers
pip install gradio
pip install transformers_stream_generator optimum auto-gptq
Ensure your runtime supports GPU (e.g., Colab or local CUDA environment).
π¦ Model Details
1. Qwen-VL-Chat-Int4 (Image-to-Text)
- Used for concise image descriptions.
- Streaming output with
TextIteratorStreamer
. - Prompt format:
<|system|>
You are a helpful assistant that describes images very concisely...
<|end|>
<|user|>
Describe the image...
<|end|>
<|assistant|>
π§ Prompt Engineering Insight
- Without
<|assistant|>
tag, the model sometimes overwrites or fails to complete properly. - Adding
<|assistant|>
clearly indicates the modelβs turn, reducing hallucinations. - Temperature capped to ~1.0 because higher values (e.g., 1.2+) lead to creative but false outputs.
2. DeepSeek-R1-Distill-Qwen-1.5B (Text-to-Code)
- Generates Python or other code from natural language prompts.
- Uses chat-based prompting with:
<think>...</think>
block for reasoning.- Final answer separated to improve clarity.
π§ Prompt Engineering Insight
- Initially used no system prompt β vague reasoning.
- Adding a system prompt improved guidance.
- Separating "thinking" and "final answer" boosted relevance.
- Future improvement: split thinking and answer into separate UI tabs.
πΌοΈ Usage: Image Description Tab
- Upload an image.
- Write a natural prompt (e.g., "What is in this picture?")
- Adjust:
Temperature
: Higher = more creativity, but limit for stability.Top-p
: Controls sampling diversity.Max new tokens
: Max length of generated sentence.
- Click Generate β streaming description appears.
π» Usage: Code Generation Tab
- Write a programming task (e.g., "Write Python code to reverse a string.")
- Adjust generation settings as above.
- Streaming output displays generated code.
- Stops early if vague prompt β clarify prompt to improve results.
π§ Future Work
- Add a separate tab for model βthinkingβ (
<think>...</think>
) versus final code. - Optional logging for input-output pairs to track hallucinations or failures.
- Add Markdown rendering for image descriptions.