JatsTheAIGen commited on
Commit
c279015
Β·
1 Parent(s): 8603d72

docs: Add deployment checklist for ZeroGPU integration

Browse files

- Comprehensive deployment verification steps
- Environment variable configuration guide
- Post-deployment verification checklist
- Troubleshooting guide
- Resource usage expectations

Files changed (1) hide show
  1. DEPLOYMENT_CHECKLIST.md +244 -0
DEPLOYMENT_CHECKLIST.md ADDED
@@ -0,0 +1,244 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Deployment Checklist - ZeroGPU Integration
2
+
3
+ ## βœ… Pre-Deployment Verification
4
+
5
+ ### Code Status
6
+ - βœ… All code changes committed and pushed
7
+ - βœ… FAISS-GPU implementation complete
8
+ - βœ… Lazy-loaded local model fallback implemented
9
+ - βœ… ZeroGPU API integration complete
10
+ - βœ… Dockerfile configured correctly
11
+ - βœ… Requirements.txt updated with faiss-gpu
12
+
13
+ ### Files Ready
14
+ - βœ… `Dockerfile` - Configured for HF Spaces
15
+ - βœ… `main.py` - Entry point for HF Spaces
16
+ - βœ… `requirements.txt` - All dependencies including faiss-gpu
17
+ - βœ… `README.md` - Contains HF Spaces configuration
18
+
19
+ ---
20
+
21
+ ## πŸš€ Deployment Steps
22
+
23
+ ### 1. Verify Repository Status
24
+ ```bash
25
+ git status # Should show clean or only documentation changes
26
+ git log --oneline -5 # Verify recent commits are pushed
27
+ ```
28
+
29
+ ### 2. Hugging Face Spaces Configuration
30
+
31
+ #### Space Settings
32
+ 1. Go to: https://huggingface.co/spaces/JatinAutonomousLabs/Research_AI_Assistant
33
+ 2. Navigate to **Settings** β†’ **Repository secrets**
34
+
35
+ #### Required Environment Variables
36
+
37
+ **Basic Configuration:**
38
+ ```bash
39
+ HF_TOKEN=your_huggingface_token_here
40
+ ```
41
+
42
+ **ZeroGPU API Configuration (Optional - for Runpod integration):**
43
+
44
+ **Option A: Service Account Mode**
45
+ ```bash
46
+ USE_ZERO_GPU=true
47
+ ZERO_GPU_API_URL=http://your-pod-ip:8000
48
49
+ ZERO_GPU_PASSWORD=your-password
50
+ ```
51
+
52
+ **Option B: Per-User Mode (Multi-tenant)**
53
+ ```bash
54
+ USE_ZERO_GPU=true
55
+ ZERO_GPU_PER_USER_MODE=true
56
+ ZERO_GPU_API_URL=http://your-pod-ip:8000
57
58
+ ZERO_GPU_ADMIN_PASSWORD=admin-password
59
+ ```
60
+
61
+ **Additional Optional Variables:**
62
+ ```bash
63
+ DB_PATH=sessions.db
64
+ LOG_LEVEL=INFO
65
+ MAX_WORKERS=4
66
+ ```
67
+
68
+ ### 3. Hardware Selection
69
+
70
+ In HF Spaces Settings:
71
+ - **GPU**: NVIDIA T4 Medium (recommended)
72
+ - 24GB vRAM (sufficient for local model fallback)
73
+ - 30GB RAM
74
+ - 8 vCPU
75
+
76
+ **Note:** With ZeroGPU API enabled, GPU is only needed for:
77
+ - FAISS-GPU vector search (automatic CPU fallback if GPU unavailable)
78
+ - Local model fallback (only loads if ZeroGPU fails)
79
+
80
+ ### 4. Deployment Process
81
+
82
+ **Automatic Deployment:**
83
+ 1. Code is already pushed to `main` branch
84
+ 2. HF Spaces will automatically:
85
+ - Detect `sdk: docker` in README.md
86
+ - Build Docker image from Dockerfile
87
+ - Install dependencies from requirements.txt
88
+ - Start application using `main.py`
89
+
90
+ **Manual Trigger (if needed):**
91
+ - Go to Space β†’ Settings β†’ Restart this Space
92
+
93
+ ### 5. Monitor Deployment
94
+
95
+ **Check Build Logs:**
96
+ - Navigate to Space β†’ Logs
97
+ - Watch for:
98
+ - βœ… Docker build success
99
+ - βœ… Dependencies installed (including faiss-gpu)
100
+ - βœ… Application startup
101
+ - βœ… ZeroGPU client initialization (if configured)
102
+ - βœ… Local model loader initialized (as fallback)
103
+
104
+ **Expected Startup Messages:**
105
+ ```
106
+ βœ“ Local model loader initialized (models will load on-demand as fallback)
107
+ βœ“ ZeroGPU API client initialized (service account mode)
108
+ βœ“ FAISS GPU resources initialized
109
+ βœ“ Application ready for launch
110
+ ```
111
+
112
+ ### 6. Verify Deployment
113
+
114
+ **Health Check:**
115
+ - Application should be accessible at: `https://huggingface.co/spaces/JatinAutonomousLabs/Research_AI_Assistant`
116
+ - Health endpoint: `/health` should return `{"status": "healthy"}`
117
+
118
+ **Test ZeroGPU Integration:**
119
+ 1. Send a test message through the UI
120
+ 2. Check logs for: `"Inference complete for {task_type} (ZeroGPU API)"`
121
+ 3. Verify no local models are loaded (if ZeroGPU working)
122
+
123
+ **Test Fallback:**
124
+ 1. Temporarily disable ZeroGPU (set `USE_ZERO_GPU=false`)
125
+ 2. Send a test message
126
+ 3. Check logs for: `"Lazy loading local model {model_id} as fallback"`
127
+ 4. Verify local model loads and works
128
+
129
+ ---
130
+
131
+ ## πŸ” Post-Deployment Verification
132
+
133
+ ### 1. Check Application Status
134
+ - [ ] Application loads without errors
135
+ - [ ] UI is accessible
136
+ - [ ] Health check endpoint responds
137
+
138
+ ### 2. Verify ZeroGPU Integration
139
+ - [ ] ZeroGPU client initializes (if configured)
140
+ - [ ] API calls succeed
141
+ - [ ] No local models loaded (if ZeroGPU working)
142
+ - [ ] Usage statistics accessible (if per-user mode)
143
+
144
+ ### 3. Verify FAISS-GPU
145
+ - [ ] FAISS GPU resources initialize
146
+ - [ ] Vector search works
147
+ - [ ] Falls back to CPU if GPU unavailable
148
+
149
+ ### 4. Verify Fallback Chain
150
+ - [ ] ZeroGPU API tried first
151
+ - [ ] Local models load only if ZeroGPU fails
152
+ - [ ] HF Inference API used as final fallback
153
+
154
+ ### 5. Monitor Resource Usage
155
+ - [ ] GPU memory usage is low (if ZeroGPU working)
156
+ - [ ] CPU usage is reasonable
157
+ - [ ] No memory leaks
158
+
159
+ ---
160
+
161
+ ## πŸ› Troubleshooting
162
+
163
+ ### Issue: Build Fails
164
+ **Check:**
165
+ - Dockerfile syntax is correct
166
+ - Requirements.txt has all dependencies
167
+ - Python 3.10 is available
168
+
169
+ **Solution:**
170
+ - Review build logs in HF Spaces
171
+ - Test Docker build locally: `docker build -t test .`
172
+
173
+ ### Issue: ZeroGPU Not Working
174
+ **Check:**
175
+ - Environment variables are set correctly
176
+ - ZeroGPU API is accessible from HF Spaces
177
+ - Network connectivity to Runpod
178
+
179
+ **Solution:**
180
+ - Verify API URL is correct
181
+ - Check credentials are valid
182
+ - Review ZeroGPU API logs
183
+
184
+ ### Issue: FAISS-GPU Not Available
185
+ **Check:**
186
+ - GPU is available in HF Spaces
187
+ - faiss-gpu package installed correctly
188
+
189
+ **Solution:**
190
+ - System will automatically fall back to CPU
191
+ - Check logs for: `"FAISS GPU not available, using CPU"`
192
+
193
+ ### Issue: Local Models Not Loading
194
+ **Check:**
195
+ - `use_local_models=True` in code
196
+ - Transformers/torch available
197
+ - GPU memory sufficient
198
+
199
+ **Solution:**
200
+ - Check logs for initialization errors
201
+ - Verify GPU availability
202
+ - Models will only load if ZeroGPU fails
203
+
204
+ ---
205
+
206
+ ## πŸ“Š Expected Resource Usage
207
+
208
+ ### With ZeroGPU API Enabled (Optimal)
209
+ - **GPU Memory**: ~0-500MB (FAISS-GPU only, no local models)
210
+ - **CPU**: Low (API calls only)
211
+ - **RAM**: ~2-4GB (application + caching)
212
+
213
+ ### With ZeroGPU Failing (Fallback Active)
214
+ - **GPU Memory**: ~15GB (local models loaded)
215
+ - **CPU**: Medium (model inference)
216
+ - **RAM**: ~4-6GB (models + application)
217
+
218
+ ### FAISS-GPU Usage
219
+ - **GPU Memory**: ~100-500MB (depending on index size)
220
+ - **CPU Fallback**: Automatic if GPU unavailable
221
+
222
+ ---
223
+
224
+ ## βœ… Deployment Complete
225
+
226
+ Once all checks pass:
227
+ - βœ… Application is live
228
+ - βœ… ZeroGPU integration working
229
+ - βœ… FAISS-GPU accelerated
230
+ - βœ… Fallback chain operational
231
+ - βœ… Monitoring in place
232
+
233
+ **Next Steps:**
234
+ - Monitor usage statistics
235
+ - Review ZeroGPU API logs
236
+ - Optimize based on usage patterns
237
+ - Scale as needed
238
+
239
+ ---
240
+
241
+ **Last Updated:** 2025-01-07
242
+ **Deployment Status:** Ready
243
+ **Version:** With ZeroGPU Integration + FAISS-GPU + Lazy Loading
244
+