File size: 3,901 Bytes
fcae81e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
import gradio as gr
import os


with gr.Blocks(title="Technical Documentation", css="footer {visibility: hidden}") as docs_demo:
    
    with gr.Column():
        gr.Markdown("""
        # Technical Documentation

        ## Overview
        This page provides details about the architecture, API, and usage of the MedGemma Agent application.

        ## Features
        - Multimodal (text + image)
        - Wikipedia tool integration
        - Real-time streaming
        - Medical knowledge base

        ---

        ## Architecture
        - **Frontend:** Gradio Blocks, custom CSS
        - **Backend:** Modal, FastAPI, VLLM, MedGemma-4B
        - **Security:** API key authentication

        ### πŸ—οΈ Technical Stack
        - Streaming responses for real-time interaction
        - Secure API key authentication
        - Base64 image processing for multimodal inputs 

        ### Frontend Interface
        - Built with Gradio for seamless user interaction
        - Custom CSS theming for professional appearance
        - Example queries for common medical scenarios            

        ```mermaid
        graph TD
            A[MedGemma Agent] --> B[Backend]
            A --> C[Frontend]
            A --> D[Model]
            
            B --> B1[Modal]
            B --> B2[FastAPI]
            B --> B3[VLLM]
            
            C --> C1[Gradio]
            C --> C2[Custom CSS]
            
            D --> D1[MedGemma-4B]
            D --> D2[4-bit Quantization]
        ```
        """)
        
        gr.Markdown("""
        ## Backend Architecture
        
        ### 🎯 Performance Features
        
        - Optimized for low latency responses
        - GPU-accelerated inference
        - Efficient memory utilization with 4-bit quantization
        - Maximum context length of 8192 tokens
        
        ### πŸ”’ Security Measures
        
        - API key authentication for all requests
        - Secure image processing
        - Protected model endpoints
                    
        ```mermaid
        flowchart LR
            A[Client] --> B[FastAPI]
            B --> C[Modal Container]
            C --> D[VLLM]
            D --> E[MedGemma-4B]
            B --> F[Wikipedia API]
        ```
        """)
        with gr.Row():
            with gr.Column():
                gr.Markdown("""
                ## πŸ’Ύ Model Deployment
                
                ### Model
                - **Model:** unsloth/medgemma-4b-it-unsloth-bnb-4bit
                - **Context Length:** 8192 tokens
                - **Quantization:** 4-bit, bfloat16
                - Utilizes Modal's GPU-accelerated containers
                - Implements efficient model loading with VLLM
                - Supports bfloat16 precision for optimal performance
                """)
            with gr.Column():
                gr.Markdown("""
                ```mermaid
                graph TD
                    A[Model Loading] --> B[GPU Acceleration]
                    B --> C[4-bit Quantization]
                    C --> D[8192 Token Context]
                    D --> E[Streaming Response]
                ```
                """)
    with gr.Column():
        gr.Markdown("""
        ## πŸ“Š System Architecture
        
        ```mermaid
        flowchart TD
            A[User Interface] --> B[API Gateway]
            B --> C[Authentication]
            C --> D[Model Service]
            D --> E[Wikipedia Service]
            D --> F[Image Processing]
            F --> G[Model Inference]
            E --> H[Response Generation]
            G --> H
            H --> I[Stream Response]
            I --> A
        ```
        """)
        
        gr.Markdown("""
        [Back to Main Application](https://huggingface.co/spaces/Agents-MCP-Hackathon/agentic-coach-advisor-medgemma)
        """)

if __name__ == "__main__":
    docs_demo.launch()