Spaces:

JatinAutonomousLabs
/

Research_AI_Assistant

Sleeping

JatsTheAIGen commited on Nov 7

Commit

c279015

1 Parent(s): 8603d72

docs: Add deployment checklist for ZeroGPU integration

- Comprehensive deployment verification steps
- Environment variable configuration guide
- Post-deployment verification checklist
- Troubleshooting guide
- Resource usage expectations

Files changed (1) hide show

DEPLOYMENT_CHECKLIST.md +244 -0

DEPLOYMENT_CHECKLIST.md ADDED Viewed

	@@ -0,0 +1,244 @@

+# Deployment Checklist - ZeroGPU Integration
+## ✅ Pre-Deployment Verification
+### Code Status
+- ✅ All code changes committed and pushed
+- ✅ FAISS-GPU implementation complete
+- ✅ Lazy-loaded local model fallback implemented
+- ✅ ZeroGPU API integration complete
+- ✅ Dockerfile configured correctly
+- ✅ Requirements.txt updated with faiss-gpu
+### Files Ready
+- ✅ `Dockerfile` - Configured for HF Spaces
+- ✅ `main.py` - Entry point for HF Spaces
+- ✅ `requirements.txt` - All dependencies including faiss-gpu
+- ✅ `README.md` - Contains HF Spaces configuration
+---
+## 🚀 Deployment Steps
+### 1. Verify Repository Status
+```bash
+git status  # Should show clean or only documentation changes
+git log --oneline -5  # Verify recent commits are pushed
+```
+### 2. Hugging Face Spaces Configuration
+#### Space Settings
+1. Go to: https://huggingface.co/spaces/JatinAutonomousLabs/Research_AI_Assistant
+2. Navigate to **Settings** → **Repository secrets**
+#### Required Environment Variables
+**Basic Configuration:**
+```bash
+HF_TOKEN=your_huggingface_token_here
+```
+**ZeroGPU API Configuration (Optional - for Runpod integration):**
+**Option A: Service Account Mode**
+```bash
+USE_ZERO_GPU=true
+ZERO_GPU_API_URL=http://your-pod-ip:8000
+[email protected]
+ZERO_GPU_PASSWORD=your-password
+```
+**Option B: Per-User Mode (Multi-tenant)**
+```bash
+USE_ZERO_GPU=true
+ZERO_GPU_PER_USER_MODE=true
+ZERO_GPU_API_URL=http://your-pod-ip:8000
+[email protected]
+ZERO_GPU_ADMIN_PASSWORD=admin-password
+```
+**Additional Optional Variables:**
+```bash
+DB_PATH=sessions.db
+LOG_LEVEL=INFO
+MAX_WORKERS=4
+```
+### 3. Hardware Selection
+In HF Spaces Settings:
+- **GPU**: NVIDIA T4 Medium (recommended)
+  - 24GB vRAM (sufficient for local model fallback)
+  - 30GB RAM
+  - 8 vCPU
+**Note:** With ZeroGPU API enabled, GPU is only needed for:
+- FAISS-GPU vector search (automatic CPU fallback if GPU unavailable)
+- Local model fallback (only loads if ZeroGPU fails)
+### 4. Deployment Process
+**Automatic Deployment:**
+1. Code is already pushed to `main` branch
+2. HF Spaces will automatically:
+   - Detect `sdk: docker` in README.md
+   - Build Docker image from Dockerfile
+   - Install dependencies from requirements.txt
+   - Start application using `main.py`
+**Manual Trigger (if needed):**
+- Go to Space → Settings → Restart this Space
+### 5. Monitor Deployment
+**Check Build Logs:**
+- Navigate to Space → Logs
+- Watch for:
+  - ✅ Docker build success
+  - ✅ Dependencies installed (including faiss-gpu)
+  - ✅ Application startup
+  - ✅ ZeroGPU client initialization (if configured)
+  - ✅ Local model loader initialized (as fallback)
+**Expected Startup Messages:**
+```
+✓ Local model loader initialized (models will load on-demand as fallback)
+✓ ZeroGPU API client initialized (service account mode)
+✓ FAISS GPU resources initialized
+✓ Application ready for launch
+```
+### 6. Verify Deployment
+**Health Check:**
+- Application should be accessible at: `https://huggingface.co/spaces/JatinAutonomousLabs/Research_AI_Assistant`
+- Health endpoint: `/health` should return `{"status": "healthy"}`
+**Test ZeroGPU Integration:**
+1. Send a test message through the UI
+2. Check logs for: `"Inference complete for {task_type} (ZeroGPU API)"`
+3. Verify no local models are loaded (if ZeroGPU working)
+**Test Fallback:**
+1. Temporarily disable ZeroGPU (set `USE_ZERO_GPU=false`)
+2. Send a test message
+3. Check logs for: `"Lazy loading local model {model_id} as fallback"`
+4. Verify local model loads and works
+---
+## 🔍 Post-Deployment Verification
+### 1. Check Application Status
+- [ ] Application loads without errors
+- [ ] UI is accessible
+- [ ] Health check endpoint responds
+### 2. Verify ZeroGPU Integration
+- [ ] ZeroGPU client initializes (if configured)
+- [ ] API calls succeed
+- [ ] No local models loaded (if ZeroGPU working)
+- [ ] Usage statistics accessible (if per-user mode)
+### 3. Verify FAISS-GPU
+- [ ] FAISS GPU resources initialize
+- [ ] Vector search works
+- [ ] Falls back to CPU if GPU unavailable
+### 4. Verify Fallback Chain
+- [ ] ZeroGPU API tried first
+- [ ] Local models load only if ZeroGPU fails
+- [ ] HF Inference API used as final fallback
+### 5. Monitor Resource Usage
+- [ ] GPU memory usage is low (if ZeroGPU working)
+- [ ] CPU usage is reasonable
+- [ ] No memory leaks
+---
+## 🐛 Troubleshooting
+### Issue: Build Fails
+**Check:**
+- Dockerfile syntax is correct
+- Requirements.txt has all dependencies
+- Python 3.10 is available
+**Solution:**
+- Review build logs in HF Spaces
+- Test Docker build locally: `docker build -t test .`
+### Issue: ZeroGPU Not Working
+**Check:**
+- Environment variables are set correctly
+- ZeroGPU API is accessible from HF Spaces
+- Network connectivity to Runpod
+**Solution:**
+- Verify API URL is correct
+- Check credentials are valid
+- Review ZeroGPU API logs
+### Issue: FAISS-GPU Not Available
+**Check:**
+- GPU is available in HF Spaces
+- faiss-gpu package installed correctly
+**Solution:**
+- System will automatically fall back to CPU
+- Check logs for: `"FAISS GPU not available, using CPU"`
+### Issue: Local Models Not Loading
+**Check:**
+- `use_local_models=True` in code
+- Transformers/torch available
+- GPU memory sufficient
+**Solution:**
+- Check logs for initialization errors
+- Verify GPU availability
+- Models will only load if ZeroGPU fails
+---
+## 📊 Expected Resource Usage
+### With ZeroGPU API Enabled (Optimal)
+- **GPU Memory**: ~0-500MB (FAISS-GPU only, no local models)
+- **CPU**: Low (API calls only)
+- **RAM**: ~2-4GB (application + caching)
+### With ZeroGPU Failing (Fallback Active)
+- **GPU Memory**: ~15GB (local models loaded)
+- **CPU**: Medium (model inference)
+- **RAM**: ~4-6GB (models + application)
+### FAISS-GPU Usage
+- **GPU Memory**: ~100-500MB (depending on index size)
+- **CPU Fallback**: Automatic if GPU unavailable
+---
+## ✅ Deployment Complete
+Once all checks pass:
+- ✅ Application is live
+- ✅ ZeroGPU integration working
+- ✅ FAISS-GPU accelerated
+- ✅ Fallback chain operational
+- ✅ Monitoring in place
+**Next Steps:**
+- Monitor usage statistics
+- Review ZeroGPU API logs
+- Optimize based on usage patterns
+- Scale as needed
+---
+**Last Updated:** 2025-01-07
+**Deployment Status:** Ready
+**Version:** With ZeroGPU Integration + FAISS-GPU + Lazy Loading