File size: 3,901 Bytes
fcae81e |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 |
import gradio as gr
import os
with gr.Blocks(title="Technical Documentation", css="footer {visibility: hidden}") as docs_demo:
with gr.Column():
gr.Markdown("""
# Technical Documentation
## Overview
This page provides details about the architecture, API, and usage of the MedGemma Agent application.
## Features
- Multimodal (text + image)
- Wikipedia tool integration
- Real-time streaming
- Medical knowledge base
---
## Architecture
- **Frontend:** Gradio Blocks, custom CSS
- **Backend:** Modal, FastAPI, VLLM, MedGemma-4B
- **Security:** API key authentication
### ποΈ Technical Stack
- Streaming responses for real-time interaction
- Secure API key authentication
- Base64 image processing for multimodal inputs
### Frontend Interface
- Built with Gradio for seamless user interaction
- Custom CSS theming for professional appearance
- Example queries for common medical scenarios
```mermaid
graph TD
A[MedGemma Agent] --> B[Backend]
A --> C[Frontend]
A --> D[Model]
B --> B1[Modal]
B --> B2[FastAPI]
B --> B3[VLLM]
C --> C1[Gradio]
C --> C2[Custom CSS]
D --> D1[MedGemma-4B]
D --> D2[4-bit Quantization]
```
""")
gr.Markdown("""
## Backend Architecture
### π― Performance Features
- Optimized for low latency responses
- GPU-accelerated inference
- Efficient memory utilization with 4-bit quantization
- Maximum context length of 8192 tokens
### π Security Measures
- API key authentication for all requests
- Secure image processing
- Protected model endpoints
```mermaid
flowchart LR
A[Client] --> B[FastAPI]
B --> C[Modal Container]
C --> D[VLLM]
D --> E[MedGemma-4B]
B --> F[Wikipedia API]
```
""")
with gr.Row():
with gr.Column():
gr.Markdown("""
## πΎ Model Deployment
### Model
- **Model:** unsloth/medgemma-4b-it-unsloth-bnb-4bit
- **Context Length:** 8192 tokens
- **Quantization:** 4-bit, bfloat16
- Utilizes Modal's GPU-accelerated containers
- Implements efficient model loading with VLLM
- Supports bfloat16 precision for optimal performance
""")
with gr.Column():
gr.Markdown("""
```mermaid
graph TD
A[Model Loading] --> B[GPU Acceleration]
B --> C[4-bit Quantization]
C --> D[8192 Token Context]
D --> E[Streaming Response]
```
""")
with gr.Column():
gr.Markdown("""
## π System Architecture
```mermaid
flowchart TD
A[User Interface] --> B[API Gateway]
B --> C[Authentication]
C --> D[Model Service]
D --> E[Wikipedia Service]
D --> F[Image Processing]
F --> G[Model Inference]
E --> H[Response Generation]
G --> H
H --> I[Stream Response]
I --> A
```
""")
gr.Markdown("""
[Back to Main Application](https://huggingface.co/spaces/Agents-MCP-Hackathon/agentic-coach-advisor-medgemma)
""")
if __name__ == "__main__":
docs_demo.launch() |