Prashant26am commited on
Commit
9b998c4
·
2 Parent(s): 2e2a7bf 6afe326

Resolve merge conflicts and optimize for Hugging Face deployment

Browse files
.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
.gitignore CHANGED
@@ -4,7 +4,6 @@ __pycache__/
4
  *$py.class
5
  *.so
6
  .Python
7
- env/
8
  build/
9
  develop-eggs/
10
  dist/
@@ -23,6 +22,7 @@ wheels/
23
 
24
  # Virtual Environment
25
  venv/
 
26
  ENV/
27
 
28
  # IDE
@@ -31,33 +31,24 @@ ENV/
31
  *.swp
32
  *.swo
33
 
34
- # OS
35
- .DS_Store
36
- Thumbs.db
37
-
38
- # Model files
39
- *.bin
40
  *.pt
41
  *.pth
42
  *.ckpt
43
- *.safetensors
44
 
45
- # Logs
46
- *.log
47
- logs/
 
48
 
49
- # Frontend
50
- frontend/
51
- node_modules/
52
- npm-debug.log*
53
- yarn-debug.log*
54
- yarn-error.log*
55
 
56
  # Temporary files
57
  *.tmp
58
  *.temp
59
  temp/
60
- tmp/
61
-
62
- # Hugging Face Space
63
- llava-chat/
 
4
  *$py.class
5
  *.so
6
  .Python
 
7
  build/
8
  develop-eggs/
9
  dist/
 
22
 
23
  # Virtual Environment
24
  venv/
25
+ env/
26
  ENV/
27
 
28
  # IDE
 
31
  *.swp
32
  *.swo
33
 
34
+ # Project specific
35
+ llava-chat/
36
+ *.log
 
 
 
37
  *.pt
38
  *.pth
39
  *.ckpt
40
+ *.bin
41
 
42
+ # Hugging Face specific
43
+ .huggingface/
44
+ wandb/
45
+ runs/
46
 
47
+ # OS
48
+ .DS_Store
49
+ Thumbs.db
 
 
 
50
 
51
  # Temporary files
52
  *.tmp
53
  *.temp
54
  temp/
 
 
 
 
README.md CHANGED
@@ -1,160 +1,23 @@
1
- # LLaVA Implementation
2
 
3
- [![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
4
- [![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
5
- [![Gradio](https://img.shields.io/badge/Gradio-4.44.1-orange.svg)](https://gradio.app/)
6
- [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/Prashant26am/llava-chat)
7
 
8
- ## 📝 About
9
 
10
- This project is an implementation of LLaVA (Large Language and Vision Assistant), a powerful multimodal AI model that combines vision and language understanding. Here's what makes this implementation special:
 
 
 
 
11
 
12
- ### 🎯 Key Features
13
 
14
- - **Multimodal Understanding**
15
- - Seamless integration of vision and language models
16
- - Real-time image analysis and description
17
- - Natural language interaction about visual content
18
- - Support for various image types and formats
19
 
20
- - **Model Architecture**
21
- - CLIP ViT vision encoder for robust image understanding
22
- - TinyLlama language model for efficient text generation
23
- - Custom projection layer for vision-language alignment
24
- - Memory-optimized for deployment on various platforms
25
-
26
- - **User Interface**
27
- - Modern Gradio-based web interface
28
- - Real-time image processing
29
- - Interactive chat experience
30
- - Customizable generation parameters
31
- - Responsive design for all devices
32
-
33
- - **Technical Highlights**
34
- - CPU-optimized implementation
35
- - Memory-efficient model loading
36
- - Fast inference with optimized settings
37
- - Robust error handling and logging
38
- - Easy deployment on Hugging Face Spaces
39
-
40
- ### 🛠️ Technology Stack
41
-
42
- - **Core Technologies**
43
- - PyTorch for deep learning
44
- - Transformers for model architecture
45
- - Gradio for web interface
46
- - FastAPI for backend services
47
- - Hugging Face for model hosting
48
-
49
- - **Development Tools**
50
- - Pre-commit hooks for code quality
51
- - GitHub Actions for CI/CD
52
- - Comprehensive testing suite
53
- - Detailed documentation
54
- - Development guidelines
55
-
56
- ### 🌟 Use Cases
57
-
58
- - **Image Understanding**
59
- - Scene description and analysis
60
- - Object detection and recognition
61
- - Visual question answering
62
- - Image-based conversations
63
-
64
- - **Applications**
65
- - Educational tools
66
- - Content moderation
67
- - Visual assistance
68
- - Research and development
69
- - Creative content generation
70
-
71
- ### 🔄 Project Status
72
-
73
- - **Current Version**: 1.0.0
74
- - **Active Development**: Yes
75
- - **Production Ready**: Yes
76
- - **Community Support**: Open for contributions
77
-
78
- ### 📊 Performance
79
-
80
- - **Model Size**: Optimized for CPU deployment
81
- - **Response Time**: Real-time processing
82
- - **Memory Usage**: Efficient resource utilization
83
- - **Scalability**: Ready for production deployment
84
-
85
- ### 🤝 Community
86
-
87
- - **Contributions**: Open for pull requests
88
- - **Issues**: Active issue tracking
89
- - **Documentation**: Comprehensive guides
90
- - **Support**: Community-driven help
91
-
92
- ### 🔮 Future Roadmap
93
-
94
- - [ ] Support for video processing
95
- - [ ] Additional model variants
96
- - [ ] Enhanced memory optimization
97
- - [ ] Extended API capabilities
98
- - [ ] More interactive features
99
-
100
- ### 📚 Resources
101
-
102
- - [Paper](https://arxiv.org/abs/2304.08485)
103
- - [Documentation](docs/)
104
- - [API Reference](docs/api/)
105
- - [Examples](examples/)
106
- - [Contributing Guide](CONTRIBUTING.md)
107
-
108
- ## 🌟 Features
109
-
110
- - **Modern Web Interface**
111
- - Beautiful Gradio-based UI
112
- - Real-time image analysis
113
- - Interactive chat experience
114
- - Responsive design
115
-
116
- - **Advanced AI Capabilities**
117
- - CLIP ViT-L/14 vision encoder
118
- - Vicuna-7B language model
119
- - Multimodal understanding
120
- - Natural conversation flow
121
-
122
- - **Developer Friendly**
123
- - Clean, modular codebase
124
- - Comprehensive documentation
125
- - Easy deployment options
126
- - Extensible architecture
127
-
128
- ## 📋 Project Structure
129
-
130
- ```
131
- llava_implementation/
132
- ├── src/ # Source code
133
- │ ├── api/ # API endpoints and FastAPI app
134
- │ ├── models/ # Model implementations
135
- │ ├── utils/ # Utility functions
136
- │ └── configs/ # Configuration files
137
- ├── tests/ # Test suite
138
- ├── docs/ # Documentation
139
- │ ├── api/ # API documentation
140
- │ ├── examples/ # Usage examples
141
- │ └── guides/ # User and developer guides
142
- ├── assets/ # Static assets
143
- │ ├── images/ # Example images
144
- │ └── icons/ # UI icons
145
- ├── scripts/ # Utility scripts
146
- └── examples/ # Example images for the web interface
147
- ```
148
-
149
- ## 🚀 Quick Start
150
-
151
- ### Prerequisites
152
-
153
- - Python 3.8+
154
- - CUDA-capable GPU (recommended)
155
- - Git
156
-
157
- ### Installation
158
 
159
  1. Clone the repository:
160
  ```bash
@@ -162,96 +25,42 @@ git clone https://github.com/Prashant-ambati/llava-implementation.git
162
  cd llava-implementation
163
  ```
164
 
165
- 2. Create and activate a virtual environment:
166
- ```bash
167
- python -m venv venv
168
- source venv/bin/activate # On Windows: venv\Scripts\activate
169
- ```
170
-
171
- 3. Install dependencies:
172
  ```bash
173
  pip install -r requirements.txt
174
  ```
175
 
176
- ### Running Locally
177
-
178
- 1. Start the development server:
179
  ```bash
180
- python src/api/app.py
181
- ```
182
-
183
- 2. Open your browser and navigate to:
184
- ```
185
- http://localhost:7860
186
  ```
187
 
188
- ## 🌐 Web Deployment
189
-
190
- ### Hugging Face Spaces
191
-
192
- The application is deployed on Hugging Face Spaces:
193
- - [Live Demo](https://huggingface.co/spaces/Prashant26am/llava-chat)
194
- - Automatic deployment from main branch
195
- - Free GPU resources
196
- - Public API access
197
 
198
- ### Local Deployment
 
 
199
 
200
- For local deployment:
201
- ```bash
202
- # Build the application
203
- python -m build
204
-
205
- # Run with production settings
206
- python src/api/app.py --production
207
- ```
208
 
209
- ## 📚 Documentation
 
 
 
 
 
210
 
211
- - [API Documentation](docs/api/README.md)
212
- - [User Guide](docs/guides/user_guide.md)
213
- - [Developer Guide](docs/guides/developer_guide.md)
214
- - [Examples](docs/examples/README.md)
215
 
216
- ## 🛠️ Development
 
217
 
218
- ### Running Tests
219
-
220
- ```bash
221
- pytest tests/
222
- ```
223
-
224
- ### Code Style
225
-
226
- This project follows PEP 8 guidelines. To check your code:
227
-
228
- ```bash
229
- flake8 src/
230
- black src/
231
- ```
232
-
233
- ### Contributing
234
-
235
- 1. Fork the repository
236
- 2. Create a feature branch
237
- 3. Commit your changes
238
- 4. Push to the branch
239
- 5. Create a Pull Request
240
-
241
- ## 📝 License
242
 
243
  This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
244
 
245
- ## 🙏 Acknowledgments
246
-
247
- - [LLaVA Paper](https://arxiv.org/abs/2304.08485) by Microsoft Research
248
- - [Gradio](https://gradio.app/) for the web interface
249
- - [Hugging Face](https://huggingface.co/) for model hosting
250
- - [Vicuna](https://lmsys.org/blog/2023-03-30-vicuna/) for the language model
251
- - [CLIP](https://openai.com/research/clip) for the vision model
252
-
253
- ## 📞 Contact
254
 
255
- - GitHub Issues: [Report a bug](https://github.com/Prashant-ambati/llava-implementation/issues)
256
- - Email: [Your Email]
257
- - Twitter: [@YourTwitter]
 
1
+ # LLaVA Chat
2
 
3
+ A lightweight implementation of LLaVA (Large Language and Vision Assistant) optimized for Hugging Face Spaces deployment.
 
 
 
4
 
5
+ ## Features
6
 
7
+ - Efficient model loading with 8-bit quantization
8
+ - Memory-optimized inference
9
+ - FastAPI backend with Gradio interface
10
+ - Support for image understanding and visual conversations
11
+ - Optimized for deployment on Hugging Face Spaces
12
 
13
+ ## Quick Start
14
 
15
+ 1. Visit the [Hugging Face Space](https://huggingface.co/spaces/Prashant26am/llava-chat)
16
+ 2. Upload an image
17
+ 3. Ask questions about the image
18
+ 4. Get AI-powered responses
 
19
 
20
+ ## Local Development
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
21
 
22
  1. Clone the repository:
23
  ```bash
 
25
  cd llava-implementation
26
  ```
27
 
28
+ 2. Install dependencies:
 
 
 
 
 
 
29
  ```bash
30
  pip install -r requirements.txt
31
  ```
32
 
33
+ 3. Run the application:
 
 
34
  ```bash
35
+ python llava-chat/app.py
 
 
 
 
 
36
  ```
37
 
38
+ ## Model Architecture
 
 
 
 
 
 
 
 
39
 
40
+ - Vision Model: CLIP ViT-Base
41
+ - Language Model: TinyLlama-1.1B-Chat
42
+ - Projection Layer: MLP with configurable hidden dimensions
43
 
44
+ ## Memory Optimization
 
 
 
 
 
 
 
45
 
46
+ The implementation includes several memory optimization techniques:
47
+ - 8-bit quantization for language model
48
+ - Efficient image processing
49
+ - Gradient checkpointing
50
+ - Memory-efficient attention
51
+ - Automatic mixed precision
52
 
53
+ ## API Endpoints
 
 
 
54
 
55
+ - `POST /process_image`: Process an image with a prompt
56
+ - `GET /status`: Check model and application status
57
 
58
+ ## License
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
59
 
60
  This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
61
 
62
+ ## Acknowledgments
 
 
 
 
 
 
 
 
63
 
64
+ - Based on the paper "Visual Instruction Tuning" (NeurIPS 2023)
65
+ - Uses models from Hugging Face Transformers
66
+ - Built with FastAPI and Gradio
app.py ADDED
@@ -0,0 +1,376 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from fastapi import FastAPI, HTTPException, Request
2
+ from fastapi.middleware.cors import CORSMiddleware
3
+ from fastapi.responses import JSONResponse
4
+ import os
5
+ import tempfile
6
+ import torch
7
+ import gradio as gr
8
+ import traceback
9
+ import sys
10
+ import logging
11
+ from PIL import Image
12
+ from models.llava import LLaVA
13
+ from typing import Dict, Any, Optional, Union
14
+
15
+ # Set up logging
16
+ logging.basicConfig(
17
+ level=logging.INFO,
18
+ format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
19
+ handlers=[
20
+ logging.StreamHandler(sys.stdout),
21
+ logging.FileHandler('app.log')
22
+ ]
23
+ )
24
+ logger = logging.getLogger(__name__)
25
+
26
+ # Initialize FastAPI app
27
+ app = FastAPI(title="LLaVA Web Interface")
28
+
29
+ # Configure CORS
30
+ app.add_middleware(
31
+ CORSMiddleware,
32
+ allow_origins=["*"],
33
+ allow_credentials=True,
34
+ allow_methods=["*"],
35
+ allow_headers=["*"],
36
+ )
37
+
38
+ # Global state
39
+ model = None
40
+ model_status: Dict[str, Any] = {
41
+ "initialized": False,
42
+ "device": None,
43
+ "error": None,
44
+ "last_error": None
45
+ }
46
+
47
+ @app.exception_handler(Exception)
48
+ async def global_exception_handler(request: Request, exc: Exception):
49
+ """Global exception handler to catch and log all unhandled exceptions."""
50
+ error_msg = f"Unhandled error: {str(exc)}\n{traceback.format_exc()}"
51
+ logger.error(error_msg)
52
+ model_status["last_error"] = error_msg
53
+ return JSONResponse(
54
+ status_code=500,
55
+ content={"error": "Internal Server Error", "details": str(exc)}
56
+ )
57
+
58
+ @app.get("/status")
59
+ async def get_status():
60
+ """Endpoint to check model and application status."""
61
+ return {
62
+ "model_initialized": model is not None,
63
+ "model_status": model_status,
64
+ "memory_usage": {
65
+ "cuda_available": torch.cuda.is_available(),
66
+ "cuda_memory_allocated": torch.cuda.memory_allocated() if torch.cuda.is_available() else 0,
67
+ "cuda_memory_reserved": torch.cuda.memory_reserved() if torch.cuda.is_available() else 0
68
+ }
69
+ }
70
+
71
+ def initialize_model():
72
+ """Initialize the LLaVA model with proper error handling."""
73
+ global model, model_status
74
+ try:
75
+ logger.info("Starting model initialization...")
76
+ model_status["initialized"] = False
77
+ model_status["error"] = None
78
+
79
+ # Clear any existing model and memory
80
+ if model is not None:
81
+ del model
82
+ torch.cuda.empty_cache()
83
+
84
+ # Initialize new model
85
+ model = LLaVA(
86
+ vision_model_path="openai/clip-vit-base-patch32",
87
+ language_model_path="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
88
+ device="cpu",
89
+ projection_hidden_dim=2048
90
+ )
91
+
92
+ # Configure model for inference
93
+ if hasattr(model, 'language_model'):
94
+ model.language_model.config.use_cache = False
95
+ model.language_model.eval()
96
+
97
+ model_status.update({
98
+ "initialized": True,
99
+ "device": str(model.device),
100
+ "error": None
101
+ })
102
+ logger.info(f"Model successfully initialized on {model.device}")
103
+ return True
104
+
105
+ except Exception as e:
106
+ error_msg = f"Model initialization failed: {str(e)}"
107
+ logger.error(error_msg)
108
+ logger.error(traceback.format_exc())
109
+ model = None
110
+ model_status.update({
111
+ "initialized": False,
112
+ "error": error_msg,
113
+ "last_error": traceback.format_exc()
114
+ })
115
+ return False
116
+
117
+ def process_image(
118
+ image: Optional[Image.Image],
119
+ prompt: str,
120
+ max_new_tokens: int = 256,
121
+ temperature: float = 0.7,
122
+ top_p: float = 0.9
123
+ ) -> str:
124
+ """Process an image with the LLaVA model with comprehensive error handling."""
125
+ global model_status
126
+ logger.info("Starting image processing...")
127
+
128
+ # Validate model state
129
+ if model is None:
130
+ logger.error("Model not initialized")
131
+ if not initialize_model():
132
+ model_status["last_error"] = "Model initialization failed during processing"
133
+ return "Error: Model initialization failed. Please try again later."
134
+
135
+ # Validate inputs
136
+ if image is None:
137
+ logger.error("No image provided")
138
+ return "Error: Please upload an image first."
139
+
140
+ if not isinstance(image, Image.Image):
141
+ logger.error(f"Invalid image type: {type(image)}")
142
+ return "Error: Invalid image format. Please upload a valid image."
143
+
144
+ if not prompt or not isinstance(prompt, str) or not prompt.strip():
145
+ logger.error("Invalid prompt")
146
+ return "Error: Please enter a valid prompt."
147
+
148
+ # Validate parameters
149
+ try:
150
+ max_new_tokens = int(max_new_tokens)
151
+ temperature = float(temperature)
152
+ top_p = float(top_p)
153
+ except (ValueError, TypeError) as e:
154
+ logger.error(f"Invalid parameters: {str(e)}")
155
+ return "Error: Invalid generation parameters."
156
+
157
+ temp_path = None
158
+ try:
159
+ logger.info(f"Processing image with prompt: {prompt[:100]}...")
160
+
161
+ # Save image with explicit format
162
+ with tempfile.NamedTemporaryFile(delete=False, suffix='.png') as temp_file:
163
+ image.save(temp_file.name, format='PNG')
164
+ temp_path = temp_file.name
165
+ logger.info(f"Saved temporary image to {temp_path}")
166
+
167
+ # Clear memory
168
+ torch.cuda.empty_cache()
169
+
170
+ # Process image
171
+ with torch.inference_mode():
172
+ try:
173
+ logger.info("Generating response...")
174
+ response = model.generate_from_image(
175
+ image_path=temp_path,
176
+ prompt=prompt,
177
+ max_new_tokens=max_new_tokens,
178
+ temperature=temperature,
179
+ top_p=top_p
180
+ )
181
+
182
+ if not response:
183
+ raise ValueError("Empty response from model")
184
+
185
+ if not isinstance(response, str):
186
+ raise ValueError(f"Invalid response type: {type(response)}")
187
+
188
+ logger.info("Successfully generated response")
189
+ model_status["last_error"] = None
190
+ return response
191
+
192
+ except Exception as model_error:
193
+ error_msg = f"Model inference error: {str(model_error)}"
194
+ logger.error(error_msg)
195
+ logger.error(traceback.format_exc())
196
+ model_status["last_error"] = error_msg
197
+ return f"Error during model inference: {str(model_error)}"
198
+
199
+ except Exception as e:
200
+ error_msg = f"Processing error: {str(e)}"
201
+ logger.error(error_msg)
202
+ logger.error(traceback.format_exc())
203
+ model_status["last_error"] = error_msg
204
+ return f"Error processing image: {str(e)}"
205
+
206
+ finally:
207
+ # Cleanup
208
+ if temp_path and os.path.exists(temp_path):
209
+ try:
210
+ os.unlink(temp_path)
211
+ logger.info("Cleaned up temporary file")
212
+ except Exception as e:
213
+ logger.warning(f"Failed to clean up temporary file: {str(e)}")
214
+
215
+ try:
216
+ torch.cuda.empty_cache()
217
+ except Exception as e:
218
+ logger.warning(f"Failed to clear CUDA cache: {str(e)}")
219
+
220
+ def get_status_text() -> str:
221
+ """Get a formatted status text for display."""
222
+ try:
223
+ status = {
224
+ "Model Initialized": "Yes" if model is not None else "No",
225
+ "Device": str(model.device) if model is not None else "None",
226
+ "Last Error": model_status.get("last_error", "None"),
227
+ "Memory Usage": {
228
+ "CUDA Available": "Yes" if torch.cuda.is_available() else "No",
229
+ "Memory Allocated": f"{torch.cuda.memory_allocated() / 1024**2:.2f} MB" if torch.cuda.is_available() else "N/A",
230
+ "Memory Reserved": f"{torch.cuda.memory_reserved() / 1024**2:.2f} MB" if torch.cuda.is_available() else "N/A"
231
+ }
232
+ }
233
+ return "\n".join(f"{k}: {v}" for k, v in status.items())
234
+ except Exception as e:
235
+ return f"Error getting status: {str(e)}"
236
+
237
+ def create_interface():
238
+ """Create the Gradio interface with proper error handling."""
239
+ try:
240
+ with gr.Blocks(title="LLaVA Chat", theme=gr.themes.Soft()) as demo:
241
+ gr.Markdown("""
242
+ # LLaVA Chat
243
+ Upload an image and chat with LLaVA about it. This model can understand and describe images, answer questions about them, and engage in visual conversations.
244
+
245
+ ## Example Prompts
246
+ Try these prompts to get started:
247
+ - "What can you see in this image?"
248
+ - "Describe this scene in detail"
249
+ - "What emotions does this image convey?"
250
+ - "What's happening in this picture?"
251
+ - "Can you identify any objects or people in this image?"
252
+ """)
253
+
254
+ with gr.Row():
255
+ with gr.Column(scale=1):
256
+ # Input components with explicit types and validation
257
+ image_input = gr.Image(
258
+ type="pil",
259
+ label="Upload Image",
260
+ image_mode="RGB",
261
+ format="PNG"
262
+ )
263
+ prompt_input = gr.Textbox(
264
+ label="Ask about the image",
265
+ placeholder="What can you see in this image?",
266
+ lines=3,
267
+ max_lines=5
268
+ )
269
+
270
+ with gr.Accordion("Advanced Settings", open=False):
271
+ max_tokens = gr.Slider(
272
+ minimum=32,
273
+ maximum=512,
274
+ value=256,
275
+ step=32,
276
+ label="Max New Tokens"
277
+ )
278
+ temperature = gr.Slider(
279
+ minimum=0.1,
280
+ maximum=1.0,
281
+ value=0.7,
282
+ step=0.1,
283
+ label="Temperature"
284
+ )
285
+ top_p = gr.Slider(
286
+ minimum=0.1,
287
+ maximum=1.0,
288
+ value=0.9,
289
+ step=0.1,
290
+ label="Top P"
291
+ )
292
+
293
+ submit_btn = gr.Button("Generate Response", variant="primary")
294
+ status_btn = gr.Button("Check Status", variant="secondary")
295
+
296
+ with gr.Column(scale=1):
297
+ output = gr.Textbox(
298
+ label="Model Response",
299
+ lines=10,
300
+ show_copy_button=True
301
+ )
302
+ status_output = gr.Textbox(
303
+ label="System Status",
304
+ lines=5,
305
+ show_copy_button=True
306
+ )
307
+
308
+ # Set up event handlers with proper error handling
309
+ def safe_process_image(*args):
310
+ try:
311
+ return process_image(*args)
312
+ except Exception as e:
313
+ logger.error(f"Interface error: {str(e)}")
314
+ logger.error(traceback.format_exc())
315
+ return f"Error: {str(e)}"
316
+
317
+ submit_btn.click(
318
+ fn=safe_process_image,
319
+ inputs=[
320
+ image_input,
321
+ prompt_input,
322
+ max_tokens,
323
+ temperature,
324
+ top_p
325
+ ],
326
+ outputs=output,
327
+ api_name="process_image"
328
+ )
329
+
330
+ status_btn.click(
331
+ fn=get_status_text,
332
+ inputs=[],
333
+ outputs=status_output,
334
+ api_name="check_status"
335
+ )
336
+
337
+ logger.info("Successfully created Gradio interface")
338
+ return demo
339
+
340
+ except Exception as e:
341
+ logger.error(f"Failed to create interface: {str(e)}")
342
+ logger.error(traceback.format_exc())
343
+ raise
344
+
345
+ # Create and mount Gradio app
346
+ try:
347
+ logger.info("Creating Gradio interface...")
348
+ demo = create_interface()
349
+ app = gr.mount_gradio_app(app, demo, path="/")
350
+ logger.info("Successfully mounted Gradio app")
351
+ except Exception as e:
352
+ logger.error(f"Failed to mount Gradio app: {str(e)}")
353
+ logger.error(traceback.format_exc())
354
+ raise
355
+
356
+ if __name__ == "__main__":
357
+ try:
358
+ # Initialize model
359
+ logger.info("Starting application...")
360
+ if not initialize_model():
361
+ logger.error("Model initialization failed. Exiting...")
362
+ sys.exit(1)
363
+
364
+ # Start server
365
+ import uvicorn
366
+ logger.info("Starting server...")
367
+ uvicorn.run(
368
+ app,
369
+ host="0.0.0.0",
370
+ port=7860,
371
+ log_level="info"
372
+ )
373
+ except Exception as e:
374
+ logger.error(f"Application startup failed: {str(e)}")
375
+ logger.error(traceback.format_exc())
376
+ sys.exit(1)
frontend/.gitignore ADDED
@@ -0,0 +1,23 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # See https://help.github.com/articles/ignoring-files/ for more about ignoring files.
2
+
3
+ # dependencies
4
+ /node_modules
5
+ /.pnp
6
+ .pnp.js
7
+
8
+ # testing
9
+ /coverage
10
+
11
+ # production
12
+ /build
13
+
14
+ # misc
15
+ .DS_Store
16
+ .env.local
17
+ .env.development.local
18
+ .env.test.local
19
+ .env.production.local
20
+
21
+ npm-debug.log*
22
+ yarn-debug.log*
23
+ yarn-error.log*
frontend/README.md ADDED
@@ -0,0 +1,46 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Getting Started with Create React App
2
+
3
+ This project was bootstrapped with [Create React App](https://github.com/facebook/create-react-app).
4
+
5
+ ## Available Scripts
6
+
7
+ In the project directory, you can run:
8
+
9
+ ### `npm start`
10
+
11
+ Runs the app in the development mode.\
12
+ Open [http://localhost:3000](http://localhost:3000) to view it in the browser.
13
+
14
+ The page will reload if you make edits.\
15
+ You will also see any lint errors in the console.
16
+
17
+ ### `npm test`
18
+
19
+ Launches the test runner in the interactive watch mode.\
20
+ See the section about [running tests](https://facebook.github.io/create-react-app/docs/running-tests) for more information.
21
+
22
+ ### `npm run build`
23
+
24
+ Builds the app for production to the `build` folder.\
25
+ It correctly bundles React in production mode and optimizes the build for the best performance.
26
+
27
+ The build is minified and the filenames include the hashes.\
28
+ Your app is ready to be deployed!
29
+
30
+ See the section about [deployment](https://facebook.github.io/create-react-app/docs/deployment) for more information.
31
+
32
+ ### `npm run eject`
33
+
34
+ **Note: this is a one-way operation. Once you `eject`, you can’t go back!**
35
+
36
+ If you aren’t satisfied with the build tool and configuration choices, you can `eject` at any time. This command will remove the single build dependency from your project.
37
+
38
+ Instead, it will copy all the configuration files and the transitive dependencies (webpack, Babel, ESLint, etc) right into your project so you have full control over them. All of the commands except `eject` will still work, but they will point to the copied scripts so you can tweak them. At this point you’re on your own.
39
+
40
+ You don’t have to ever use `eject`. The curated feature set is suitable for small and middle deployments, and you shouldn’t feel obligated to use this feature. However we understand that this tool wouldn’t be useful if you couldn’t customize it when you are ready for it.
41
+
42
+ ## Learn More
43
+
44
+ You can learn more in the [Create React App documentation](https://facebook.github.io/create-react-app/docs/getting-started).
45
+
46
+ To learn React, check out the [React documentation](https://reactjs.org/).
frontend/package-lock.json ADDED
The diff for this file is too large to render. See raw diff
 
frontend/package.json ADDED
@@ -0,0 +1,52 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "name": "frontend",
3
+ "version": "0.1.0",
4
+ "private": true,
5
+ "dependencies": {
6
+ "@headlessui/react": "^2.2.4",
7
+ "@heroicons/react": "^2.2.0",
8
+ "@tailwindcss/forms": "^0.5.10",
9
+ "@testing-library/dom": "^10.4.0",
10
+ "@testing-library/jest-dom": "^6.6.3",
11
+ "@testing-library/react": "^16.3.0",
12
+ "@testing-library/user-event": "^13.5.0",
13
+ "@types/jest": "^27.5.2",
14
+ "@types/node": "^16.18.126",
15
+ "@types/react": "^19.1.5",
16
+ "@types/react-dom": "^19.1.5",
17
+ "autoprefixer": "^10.4.21",
18
+ "axios": "^1.9.0",
19
+ "postcss": "^8.5.3",
20
+ "react": "^19.1.0",
21
+ "react-dom": "^19.1.0",
22
+ "react-dropzone": "^14.3.8",
23
+ "react-scripts": "5.0.1",
24
+ "tailwindcss": "^4.1.7",
25
+ "typescript": "^4.9.5",
26
+ "web-vitals": "^2.1.4"
27
+ },
28
+ "scripts": {
29
+ "start": "react-scripts start",
30
+ "build": "react-scripts build",
31
+ "test": "react-scripts test",
32
+ "eject": "react-scripts eject"
33
+ },
34
+ "eslintConfig": {
35
+ "extends": [
36
+ "react-app",
37
+ "react-app/jest"
38
+ ]
39
+ },
40
+ "browserslist": {
41
+ "production": [
42
+ ">0.2%",
43
+ "not dead",
44
+ "not op_mini all"
45
+ ],
46
+ "development": [
47
+ "last 1 chrome version",
48
+ "last 1 firefox version",
49
+ "last 1 safari version"
50
+ ]
51
+ }
52
+ }
frontend/public/favicon.ico ADDED
frontend/public/index.html ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <!DOCTYPE html>
2
+ <html lang="en">
3
+ <head>
4
+ <meta charset="utf-8" />
5
+ <link rel="icon" href="%PUBLIC_URL%/favicon.ico" />
6
+ <meta name="viewport" content="width=device-width, initial-scale=1" />
7
+ <meta name="theme-color" content="#000000" />
8
+ <meta
9
+ name="description"
10
+ content="Web site created using create-react-app"
11
+ />
12
+ <link rel="apple-touch-icon" href="%PUBLIC_URL%/logo192.png" />
13
+ <!--
14
+ manifest.json provides metadata used when your web app is installed on a
15
+ user's mobile device or desktop. See https://developers.google.com/web/fundamentals/web-app-manifest/
16
+ -->
17
+ <link rel="manifest" href="%PUBLIC_URL%/manifest.json" />
18
+ <!--
19
+ Notice the use of %PUBLIC_URL% in the tags above.
20
+ It will be replaced with the URL of the `public` folder during the build.
21
+ Only files inside the `public` folder can be referenced from the HTML.
22
+
23
+ Unlike "/favicon.ico" or "favicon.ico", "%PUBLIC_URL%/favicon.ico" will
24
+ work correctly both with client-side routing and a non-root public URL.
25
+ Learn how to configure a non-root public URL by running `npm run build`.
26
+ -->
27
+ <title>React App</title>
28
+ </head>
29
+ <body>
30
+ <noscript>You need to enable JavaScript to run this app.</noscript>
31
+ <div id="root"></div>
32
+ <!--
33
+ This HTML file is a template.
34
+ If you open it directly in the browser, you will see an empty page.
35
+
36
+ You can add webfonts, meta tags, or analytics to this file.
37
+ The build step will place the bundled scripts into the <body> tag.
38
+
39
+ To begin the development, run `npm start` or `yarn start`.
40
+ To create a production bundle, use `npm run build` or `yarn build`.
41
+ -->
42
+ </body>
43
+ </html>
frontend/public/logo192.png ADDED
frontend/public/logo512.png ADDED
frontend/public/manifest.json ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "short_name": "React App",
3
+ "name": "Create React App Sample",
4
+ "icons": [
5
+ {
6
+ "src": "favicon.ico",
7
+ "sizes": "64x64 32x32 24x24 16x16",
8
+ "type": "image/x-icon"
9
+ },
10
+ {
11
+ "src": "logo192.png",
12
+ "type": "image/png",
13
+ "sizes": "192x192"
14
+ },
15
+ {
16
+ "src": "logo512.png",
17
+ "type": "image/png",
18
+ "sizes": "512x512"
19
+ }
20
+ ],
21
+ "start_url": ".",
22
+ "display": "standalone",
23
+ "theme_color": "#000000",
24
+ "background_color": "#ffffff"
25
+ }
frontend/public/robots.txt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ # https://www.robotstxt.org/robotstxt.html
2
+ User-agent: *
3
+ Disallow:
frontend/src/App.css ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ .App {
2
+ text-align: center;
3
+ }
4
+
5
+ .App-logo {
6
+ height: 40vmin;
7
+ pointer-events: none;
8
+ }
9
+
10
+ @media (prefers-reduced-motion: no-preference) {
11
+ .App-logo {
12
+ animation: App-logo-spin infinite 20s linear;
13
+ }
14
+ }
15
+
16
+ .App-header {
17
+ background-color: #282c34;
18
+ min-height: 100vh;
19
+ display: flex;
20
+ flex-direction: column;
21
+ align-items: center;
22
+ justify-content: center;
23
+ font-size: calc(10px + 2vmin);
24
+ color: white;
25
+ }
26
+
27
+ .App-link {
28
+ color: #61dafb;
29
+ }
30
+
31
+ @keyframes App-logo-spin {
32
+ from {
33
+ transform: rotate(0deg);
34
+ }
35
+ to {
36
+ transform: rotate(360deg);
37
+ }
38
+ }
frontend/src/App.test.tsx ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ import React from 'react';
2
+ import { render, screen } from '@testing-library/react';
3
+ import App from './App';
4
+
5
+ test('renders learn react link', () => {
6
+ render(<App />);
7
+ const linkElement = screen.getByText(/learn react/i);
8
+ expect(linkElement).toBeInTheDocument();
9
+ });
frontend/src/App.tsx ADDED
@@ -0,0 +1,177 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import React, { useState, useCallback } from 'react';
2
+ import { useDropzone } from 'react-dropzone';
3
+ import axios from 'axios';
4
+ import { ChatBubbleLeftIcon, PhotoIcon, ArrowUpTrayIcon } from '@heroicons/react/24/outline';
5
+
6
+ interface Message {
7
+ type: 'user' | 'assistant';
8
+ content: string;
9
+ imageUrl?: string;
10
+ }
11
+
12
+ function App() {
13
+ const [messages, setMessages] = useState<Message[]>([]);
14
+ const [prompt, setPrompt] = useState('');
15
+ const [isLoading, setIsLoading] = useState(false);
16
+ const [selectedImage, setSelectedImage] = useState<File | null>(null);
17
+ const [previewUrl, setPreviewUrl] = useState<string | null>(null);
18
+
19
+ const onDrop = useCallback((acceptedFiles: File[]) => {
20
+ const file = acceptedFiles[0];
21
+ if (file) {
22
+ setSelectedImage(file);
23
+ const url = URL.createObjectURL(file);
24
+ setPreviewUrl(url);
25
+ }
26
+ }, []);
27
+
28
+ const { getRootProps, getInputProps, isDragActive } = useDropzone({
29
+ onDrop,
30
+ accept: {
31
+ 'image/*': ['.png', '.jpg', '.jpeg', '.gif']
32
+ },
33
+ maxFiles: 1
34
+ });
35
+
36
+ const handleSubmit = async (e: React.FormEvent) => {
37
+ e.preventDefault();
38
+ if (!selectedImage || !prompt.trim()) return;
39
+
40
+ setIsLoading(true);
41
+ const formData = new FormData();
42
+ formData.append('file', selectedImage);
43
+ formData.append('prompt', prompt);
44
+
45
+ // Add user message
46
+ setMessages(prev => [...prev, {
47
+ type: 'user',
48
+ content: prompt,
49
+ imageUrl: previewUrl || undefined
50
+ }]);
51
+
52
+ try {
53
+ const response = await axios.post('http://localhost:8000/api/chat', formData, {
54
+ headers: {
55
+ 'Content-Type': 'multipart/form-data',
56
+ },
57
+ });
58
+
59
+ // Add assistant message
60
+ setMessages(prev => [...prev, {
61
+ type: 'assistant',
62
+ content: response.data.response
63
+ }]);
64
+
65
+ // Clear input
66
+ setPrompt('');
67
+ setSelectedImage(null);
68
+ setPreviewUrl(null);
69
+ } catch (error) {
70
+ console.error('Error:', error);
71
+ // Add error message
72
+ setMessages(prev => [...prev, {
73
+ type: 'assistant',
74
+ content: 'Sorry, there was an error processing your request.'
75
+ }]);
76
+ } finally {
77
+ setIsLoading(false);
78
+ }
79
+ };
80
+
81
+ return (
82
+ <div className="min-h-screen bg-gray-100">
83
+ <div className="max-w-4xl mx-auto p-4">
84
+ <header className="text-center py-8">
85
+ <h1 className="text-4xl font-bold text-primary-600">LLaVA Chat</h1>
86
+ <p className="text-gray-600 mt-2">Upload an image and chat with LLaVA about it</p>
87
+ </header>
88
+
89
+ <div className="bg-white rounded-lg shadow-lg p-4 mb-4">
90
+ <div className="space-y-4">
91
+ {messages.map((message, index) => (
92
+ <div
93
+ key={index}
94
+ className={`flex ${message.type === 'user' ? 'justify-end' : 'justify-start'}`}
95
+ >
96
+ <div
97
+ className={`max-w-[80%] rounded-lg p-4 ${
98
+ message.type === 'user'
99
+ ? 'bg-primary-600 text-white'
100
+ : 'bg-gray-100 text-gray-800'
101
+ }`}
102
+ >
103
+ {message.imageUrl && (
104
+ <img
105
+ src={message.imageUrl}
106
+ alt="Uploaded"
107
+ className="w-48 h-48 object-cover rounded-lg mb-2"
108
+ />
109
+ )}
110
+ <p className="whitespace-pre-wrap">{message.content}</p>
111
+ </div>
112
+ </div>
113
+ ))}
114
+ </div>
115
+ </div>
116
+
117
+ <form onSubmit={handleSubmit} className="bg-white rounded-lg shadow-lg p-4">
118
+ {!selectedImage ? (
119
+ <div
120
+ {...getRootProps()}
121
+ className={`border-2 border-dashed rounded-lg p-8 text-center cursor-pointer transition-colors
122
+ ${isDragActive ? 'border-primary-500 bg-primary-50' : 'border-gray-300 hover:border-primary-500'}`}
123
+ >
124
+ <input {...getInputProps()} />
125
+ <PhotoIcon className="mx-auto h-12 w-12 text-gray-400" />
126
+ <p className="mt-2 text-sm text-gray-600">
127
+ Drag and drop an image here, or click to select
128
+ </p>
129
+ </div>
130
+ ) : (
131
+ <div className="relative">
132
+ <img
133
+ src={previewUrl || ''}
134
+ alt="Preview"
135
+ className="w-full h-48 object-cover rounded-lg"
136
+ />
137
+ <button
138
+ type="button"
139
+ onClick={() => {
140
+ setSelectedImage(null);
141
+ setPreviewUrl(null);
142
+ }}
143
+ className="absolute top-2 right-2 bg-red-500 text-white p-1 rounded-full hover:bg-red-600"
144
+ >
145
+ ×
146
+ </button>
147
+ </div>
148
+ )}
149
+
150
+ <div className="mt-4 flex space-x-4">
151
+ <input
152
+ type="text"
153
+ value={prompt}
154
+ onChange={(e) => setPrompt(e.target.value)}
155
+ placeholder="Ask about the image..."
156
+ className="input-primary flex-1"
157
+ disabled={!selectedImage || isLoading}
158
+ />
159
+ <button
160
+ type="submit"
161
+ disabled={!selectedImage || !prompt.trim() || isLoading}
162
+ className="btn-primary disabled:opacity-50 disabled:cursor-not-allowed"
163
+ >
164
+ {isLoading ? (
165
+ <div className="w-6 h-6 border-2 border-white border-t-transparent rounded-full animate-spin" />
166
+ ) : (
167
+ <ArrowUpTrayIcon className="h-6 w-6" />
168
+ )}
169
+ </button>
170
+ </div>
171
+ </form>
172
+ </div>
173
+ </div>
174
+ );
175
+ }
176
+
177
+ export default App;
frontend/src/index.css ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ @tailwind base;
2
+ @tailwind components;
3
+ @tailwind utilities;
4
+
5
+ @layer components {
6
+ .btn-primary {
7
+ @apply px-4 py-2 bg-primary-600 text-white rounded-md hover:bg-primary-700 focus:outline-none focus:ring-2 focus:ring-primary-500 focus:ring-offset-2;
8
+ }
9
+
10
+ .input-primary {
11
+ @apply block w-full rounded-md border-gray-300 shadow-sm focus:border-primary-500 focus:ring-primary-500;
12
+ }
13
+ }
14
+
15
+ body {
16
+ margin: 0;
17
+ font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', 'Roboto', 'Oxygen',
18
+ 'Ubuntu', 'Cantarell', 'Fira Sans', 'Droid Sans', 'Helvetica Neue',
19
+ sans-serif;
20
+ -webkit-font-smoothing: antialiased;
21
+ -moz-osx-font-smoothing: grayscale;
22
+ }
23
+
24
+ code {
25
+ font-family: source-code-pro, Menlo, Monaco, Consolas, 'Courier New',
26
+ monospace;
27
+ }
frontend/src/index.tsx ADDED
@@ -0,0 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import React from 'react';
2
+ import ReactDOM from 'react-dom/client';
3
+ import './index.css';
4
+ import App from './App';
5
+ import reportWebVitals from './reportWebVitals';
6
+
7
+ const root = ReactDOM.createRoot(
8
+ document.getElementById('root') as HTMLElement
9
+ );
10
+ root.render(
11
+ <React.StrictMode>
12
+ <App />
13
+ </React.StrictMode>
14
+ );
15
+
16
+ // If you want to start measuring performance in your app, pass a function
17
+ // to log results (for example: reportWebVitals(console.log))
18
+ // or send to an analytics endpoint. Learn more: https://bit.ly/CRA-vitals
19
+ reportWebVitals();
frontend/src/logo.svg ADDED
frontend/src/react-app-env.d.ts ADDED
@@ -0,0 +1 @@
 
 
1
+ /// <reference types="react-scripts" />
frontend/src/reportWebVitals.ts ADDED
@@ -0,0 +1,15 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ import { ReportHandler } from 'web-vitals';
2
+
3
+ const reportWebVitals = (onPerfEntry?: ReportHandler) => {
4
+ if (onPerfEntry && onPerfEntry instanceof Function) {
5
+ import('web-vitals').then(({ getCLS, getFID, getFCP, getLCP, getTTFB }) => {
6
+ getCLS(onPerfEntry);
7
+ getFID(onPerfEntry);
8
+ getFCP(onPerfEntry);
9
+ getLCP(onPerfEntry);
10
+ getTTFB(onPerfEntry);
11
+ });
12
+ }
13
+ };
14
+
15
+ export default reportWebVitals;
frontend/src/setupTests.ts ADDED
@@ -0,0 +1,5 @@
 
 
 
 
 
 
1
+ // jest-dom adds custom jest matchers for asserting on DOM nodes.
2
+ // allows you to do things like:
3
+ // expect(element).toHaveTextContent(/react/i)
4
+ // learn more: https://github.com/testing-library/jest-dom
5
+ import '@testing-library/jest-dom';
frontend/tailwind.config.js ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ /** @type {import('tailwindcss').Config} */
2
+ module.exports = {
3
+ content: [
4
+ "./src/**/*.{js,jsx,ts,tsx}",
5
+ ],
6
+ theme: {
7
+ extend: {
8
+ colors: {
9
+ primary: {
10
+ 50: '#f0f9ff',
11
+ 100: '#e0f2fe',
12
+ 200: '#bae6fd',
13
+ 300: '#7dd3fc',
14
+ 400: '#38bdf8',
15
+ 500: '#0ea5e9',
16
+ 600: '#0284c7',
17
+ 700: '#0369a1',
18
+ 800: '#075985',
19
+ 900: '#0c4a6e',
20
+ },
21
+ },
22
+ },
23
+ },
24
+ plugins: [
25
+ require('@tailwindcss/forms'),
26
+ ],
27
+ }
frontend/tsconfig.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "compilerOptions": {
3
+ "target": "es5",
4
+ "lib": [
5
+ "dom",
6
+ "dom.iterable",
7
+ "esnext"
8
+ ],
9
+ "allowJs": true,
10
+ "skipLibCheck": true,
11
+ "esModuleInterop": true,
12
+ "allowSyntheticDefaultImports": true,
13
+ "strict": true,
14
+ "forceConsistentCasingInFileNames": true,
15
+ "noFallthroughCasesInSwitch": true,
16
+ "module": "esnext",
17
+ "moduleResolution": "node",
18
+ "resolveJsonModule": true,
19
+ "isolatedModules": true,
20
+ "noEmit": true,
21
+ "jsx": "react-jsx"
22
+ },
23
+ "include": [
24
+ "src"
25
+ ]
26
+ }
requirements.txt CHANGED
@@ -1,8 +1,13 @@
1
- torch>=2.0.0
2
- torchvision>=0.15.0
3
  transformers>=4.36.0
4
- accelerate>=0.25.0
5
  pillow>=10.0.0
 
 
 
 
 
 
 
6
  numpy>=1.24.0
7
  tqdm>=4.65.0
8
  matplotlib>=3.7.0
@@ -11,14 +16,12 @@ einops>=0.7.0
11
  timm>=0.9.0
12
  sentencepiece>=0.1.99
13
  peft>=0.7.0
14
- safetensors>=0.4.0
15
- gradio==4.44.1
16
- fastapi>=0.109.0
17
- uvicorn>=0.27.0
18
  python-multipart>=0.0.6
19
  pydantic>=2.5.0
20
  python-jose>=3.3.0
21
  passlib>=1.7.4
22
  bcrypt>=4.0.1
23
  aiofiles>=23.2.0
24
- httpx>=0.26.0
 
 
 
 
 
1
  transformers>=4.36.0
2
+ torch>=2.1.0
3
  pillow>=10.0.0
4
+ gradio>=4.0.0
5
+ fastapi>=0.100.0
6
+ uvicorn>=0.23.0
7
+ accelerate>=0.25.0
8
+ bitsandbytes>=0.41.0 # For 8-bit quantization
9
+ safetensors>=0.4.0 # For safe model loading
10
+ torchvision>=0.15.0
11
  numpy>=1.24.0
12
  tqdm>=4.65.0
13
  matplotlib>=3.7.0
 
16
  timm>=0.9.0
17
  sentencepiece>=0.1.99
18
  peft>=0.7.0
 
 
 
 
19
  python-multipart>=0.0.6
20
  pydantic>=2.5.0
21
  python-jose>=3.3.0
22
  passlib>=1.7.4
23
  bcrypt>=4.0.1
24
  aiofiles>=23.2.0
25
+ httpx>=0.26.0
26
+ # Memory optimization
27
+ optimum>=1.16.0
src/models/main.py CHANGED
@@ -53,9 +53,7 @@ def main():
53
  model = LLaVA(
54
  vision_model_path=args.vision_model,
55
  language_model_path=args.language_model,
56
- device=args.device,
57
- load_in_8bit=args.load_8bit,
58
- load_in_4bit=args.load_4bit
59
  )
60
 
61
  print(f"Model initialized on {model.device}")
 
53
  model = LLaVA(
54
  vision_model_path=args.vision_model,
55
  language_model_path=args.language_model,
56
+ device=args.device
 
 
57
  )
58
 
59
  print(f"Model initialized on {model.device}")