File size: 6,483 Bytes
c279015
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
e716c29
c279015
 
 
 
 
 
 
 
e716c29
c279015
 
 
 
e716c29
 
c279015
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
# Deployment Checklist - ZeroGPU Integration

## βœ… Pre-Deployment Verification

### Code Status
- βœ… All code changes committed and pushed
- βœ… FAISS-GPU implementation complete
- βœ… Lazy-loaded local model fallback implemented
- βœ… ZeroGPU API integration complete
- βœ… Dockerfile configured correctly
- βœ… Requirements.txt updated with faiss-gpu

### Files Ready
- βœ… `Dockerfile` - Configured for HF Spaces
- βœ… `main.py` - Entry point for HF Spaces
- βœ… `requirements.txt` - All dependencies including faiss-gpu
- βœ… `README.md` - Contains HF Spaces configuration

---

## πŸš€ Deployment Steps

### 1. Verify Repository Status
```bash
git status  # Should show clean or only documentation changes
git log --oneline -5  # Verify recent commits are pushed
```

### 2. Hugging Face Spaces Configuration

#### Space Settings
1. Go to: https://huggingface.co/spaces/JatinAutonomousLabs/Research_AI_Assistant
2. Navigate to **Settings** β†’ **Repository secrets**

#### Required Environment Variables

**Basic Configuration:**
```bash
HF_TOKEN=your_huggingface_token_here
```

**ZeroGPU API Configuration (Optional - for Runpod integration):**

**Option A: Service Account Mode**
```bash
USE_ZERO_GPU=true
ZERO_GPU_API_URL=https://bm9njt1ypzvuqw-8000.proxy.runpod.net
[email protected]
ZERO_GPU_PASSWORD=your-password
```

**Option B: Per-User Mode (Multi-tenant)**
```bash
USE_ZERO_GPU=true
ZERO_GPU_PER_USER_MODE=true
ZERO_GPU_API_URL=https://bm9njt1ypzvuqw-8000.proxy.runpod.net
[email protected]
ZERO_GPU_ADMIN_PASSWORD=admin-password
```

**Note:** Runpod proxy URLs follow the format: `https://<pod-id>-8000.proxy.runpod.net`

**Additional Optional Variables:**
```bash
DB_PATH=sessions.db
LOG_LEVEL=INFO
MAX_WORKERS=4
```

### 3. Hardware Selection

In HF Spaces Settings:
- **GPU**: NVIDIA T4 Medium (recommended)
  - 24GB vRAM (sufficient for local model fallback)
  - 30GB RAM
  - 8 vCPU

**Note:** With ZeroGPU API enabled, GPU is only needed for:
- FAISS-GPU vector search (automatic CPU fallback if GPU unavailable)
- Local model fallback (only loads if ZeroGPU fails)

### 4. Deployment Process

**Automatic Deployment:**
1. Code is already pushed to `main` branch
2. HF Spaces will automatically:
   - Detect `sdk: docker` in README.md
   - Build Docker image from Dockerfile
   - Install dependencies from requirements.txt
   - Start application using `main.py`

**Manual Trigger (if needed):**
- Go to Space β†’ Settings β†’ Restart this Space

### 5. Monitor Deployment

**Check Build Logs:**
- Navigate to Space β†’ Logs
- Watch for:
  - βœ… Docker build success
  - βœ… Dependencies installed (including faiss-gpu)
  - βœ… Application startup
  - βœ… ZeroGPU client initialization (if configured)
  - βœ… Local model loader initialized (as fallback)

**Expected Startup Messages:**
```
βœ“ Local model loader initialized (models will load on-demand as fallback)
βœ“ ZeroGPU API client initialized (service account mode)
βœ“ FAISS GPU resources initialized
βœ“ Application ready for launch
```

### 6. Verify Deployment

**Health Check:**
- Application should be accessible at: `https://huggingface.co/spaces/JatinAutonomousLabs/Research_AI_Assistant`
- Health endpoint: `/health` should return `{"status": "healthy"}`

**Test ZeroGPU Integration:**
1. Send a test message through the UI
2. Check logs for: `"Inference complete for {task_type} (ZeroGPU API)"`
3. Verify no local models are loaded (if ZeroGPU working)

**Test Fallback:**
1. Temporarily disable ZeroGPU (set `USE_ZERO_GPU=false`)
2. Send a test message
3. Check logs for: `"Lazy loading local model {model_id} as fallback"`
4. Verify local model loads and works

---

## πŸ” Post-Deployment Verification

### 1. Check Application Status
- [ ] Application loads without errors
- [ ] UI is accessible
- [ ] Health check endpoint responds

### 2. Verify ZeroGPU Integration
- [ ] ZeroGPU client initializes (if configured)
- [ ] API calls succeed
- [ ] No local models loaded (if ZeroGPU working)
- [ ] Usage statistics accessible (if per-user mode)

### 3. Verify FAISS-GPU
- [ ] FAISS GPU resources initialize
- [ ] Vector search works
- [ ] Falls back to CPU if GPU unavailable

### 4. Verify Fallback Chain
- [ ] ZeroGPU API tried first
- [ ] Local models load only if ZeroGPU fails
- [ ] HF Inference API used as final fallback

### 5. Monitor Resource Usage
- [ ] GPU memory usage is low (if ZeroGPU working)
- [ ] CPU usage is reasonable
- [ ] No memory leaks

---

## πŸ› Troubleshooting

### Issue: Build Fails
**Check:**
- Dockerfile syntax is correct
- Requirements.txt has all dependencies
- Python 3.10 is available

**Solution:**
- Review build logs in HF Spaces
- Test Docker build locally: `docker build -t test .`

### Issue: ZeroGPU Not Working
**Check:**
- Environment variables are set correctly
- ZeroGPU API is accessible from HF Spaces
- Network connectivity to Runpod

**Solution:**
- Verify API URL is correct
- Check credentials are valid
- Review ZeroGPU API logs

### Issue: FAISS-GPU Not Available
**Check:**
- GPU is available in HF Spaces
- faiss-gpu package installed correctly

**Solution:**
- System will automatically fall back to CPU
- Check logs for: `"FAISS GPU not available, using CPU"`

### Issue: Local Models Not Loading
**Check:**
- `use_local_models=True` in code
- Transformers/torch available
- GPU memory sufficient

**Solution:**
- Check logs for initialization errors
- Verify GPU availability
- Models will only load if ZeroGPU fails

---

## πŸ“Š Expected Resource Usage

### With ZeroGPU API Enabled (Optimal)
- **GPU Memory**: ~0-500MB (FAISS-GPU only, no local models)
- **CPU**: Low (API calls only)
- **RAM**: ~2-4GB (application + caching)

### With ZeroGPU Failing (Fallback Active)
- **GPU Memory**: ~15GB (local models loaded)
- **CPU**: Medium (model inference)
- **RAM**: ~4-6GB (models + application)

### FAISS-GPU Usage
- **GPU Memory**: ~100-500MB (depending on index size)
- **CPU Fallback**: Automatic if GPU unavailable

---

## βœ… Deployment Complete

Once all checks pass:
- βœ… Application is live
- βœ… ZeroGPU integration working
- βœ… FAISS-GPU accelerated
- βœ… Fallback chain operational
- βœ… Monitoring in place

**Next Steps:**
- Monitor usage statistics
- Review ZeroGPU API logs
- Optimize based on usage patterns
- Scale as needed

---

**Last Updated:** 2025-01-07  
**Deployment Status:** Ready  
**Version:** With ZeroGPU Integration + FAISS-GPU + Lazy Loading