Gradio_image_code / README.md
AhmedHAnwar's picture
Update README.md
be4f957 verified

A newer version of the Gradio SDK is available: 5.38.2

Upgrade
metadata
title: Gradio Image Code
emoji: πŸŒ–
colorFrom: pink
colorTo: yellow
sdk: gradio
sdk_version: 5.32.1
app_file: app.py
pinned: false

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

🧠 Qwen + DeepSeek Gradio App

A Gradio web app that demonstrates:

This app is tested and runs efficiently on Kaggle notebooks with T4 x2 GPU accelerators.

⚠️ Note: Colab is not recommended for this project because downloading the Qwen-VL-Chat-Int4 model takes a long time and often fails. Kaggle is faster and more stable.


πŸš€ Features

  • πŸ–ΌοΈ Vision-Language tab: Upload an image + custom prompt β†’ generate short description
  • πŸ’» Code Generator tab: Write a prompt β†’ get streaming code output
  • Adjustable decoding parameters: temperature, top-p, max_new_tokens

🧩 Installation

pip install transformers
pip install gradio
pip install transformers_stream_generator optimum auto-gptq

Ensure your runtime supports GPU (e.g., Colab or local CUDA environment).


πŸ“¦ Model Details

1. Qwen-VL-Chat-Int4 (Image-to-Text)

  • Used for concise image descriptions.
  • Streaming output with TextIteratorStreamer.
  • Prompt format:
<|system|>
You are a helpful assistant that describes images very concisely...
<|end|>
<|user|>
Describe the image...
<|end|>
<|assistant|>

πŸ”§ Prompt Engineering Insight

  • Without <|assistant|> tag, the model sometimes overwrites or fails to complete properly.
  • Adding <|assistant|> clearly indicates the model’s turn, reducing hallucinations.
  • Temperature capped to ~1.0 because higher values (e.g., 1.2+) lead to creative but false outputs.

2. DeepSeek-R1-Distill-Qwen-1.5B (Text-to-Code)

  • Generates Python or other code from natural language prompts.
  • Uses chat-based prompting with:
    • <think>...</think> block for reasoning.
    • Final answer separated to improve clarity.

πŸ”§ Prompt Engineering Insight

  • Initially used no system prompt β†’ vague reasoning.
  • Adding a system prompt improved guidance.
  • Separating "thinking" and "final answer" boosted relevance.
  • Future improvement: split thinking and answer into separate UI tabs.

πŸ–ΌοΈ Usage: Image Description Tab

  • Upload an image.
  • Write a natural prompt (e.g., "What is in this picture?")
  • Adjust:
    • Temperature: Higher = more creativity, but limit for stability.
    • Top-p: Controls sampling diversity.
    • Max new tokens: Max length of generated sentence.
  • Click Generate β†’ streaming description appears.

πŸ’» Usage: Code Generation Tab

  • Write a programming task (e.g., "Write Python code to reverse a string.")
  • Adjust generation settings as above.
  • Streaming output displays generated code.
  • Stops early if vague prompt β†’ clarify prompt to improve results.

🧠 Future Work

  • Add a separate tab for model β€œthinking” (<think>...</think>) versus final code.
  • Optional logging for input-output pairs to track hallucinations or failures.
  • Add Markdown rendering for image descriptions.