Spaces:

Prashant26am
/

llava-chat

Sleeping

App Files Files Community

Prashant26am commited on May 24

Commit

9b998c4

2 Parent(s): 2e2a7bf 6afe326

Resolve merge conflicts and optimize for Hugging Face deployment

Browse files

Files changed (27) hide show

.gitattributes +35 -0
.gitignore +12 -21
README.md +36 -227
app.py +376 -0
frontend/.gitignore +23 -0
frontend/README.md +46 -0
frontend/package-lock.json +0 -0
frontend/package.json +52 -0
frontend/public/favicon.ico +0 -0
frontend/public/index.html +43 -0
frontend/public/logo192.png +0 -0
frontend/public/logo512.png +0 -0
frontend/public/manifest.json +25 -0
frontend/public/robots.txt +3 -0
frontend/src/App.css +38 -0
frontend/src/App.test.tsx +9 -0
frontend/src/App.tsx +177 -0
frontend/src/index.css +27 -0
frontend/src/index.tsx +19 -0
frontend/src/logo.svg +1 -0
frontend/src/react-app-env.d.ts +1 -0
frontend/src/reportWebVitals.ts +15 -0
frontend/src/setupTests.ts +5 -0
frontend/tailwind.config.js +27 -0
frontend/tsconfig.json +26 -0
requirements.txt +11 -8
src/models/main.py +1 -3

.gitattributes ADDED Viewed

	@@ -0,0 +1,35 @@

+*.7z filter=lfs diff=lfs merge=lfs -text
+*.arrow filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.bz2 filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+*.ftz filter=lfs diff=lfs merge=lfs -text
+*.gz filter=lfs diff=lfs merge=lfs -text
+*.h5 filter=lfs diff=lfs merge=lfs -text
+*.joblib filter=lfs diff=lfs merge=lfs -text
+*.lfs.* filter=lfs diff=lfs merge=lfs -text
+*.mlmodel filter=lfs diff=lfs merge=lfs -text
+*.model filter=lfs diff=lfs merge=lfs -text
+*.msgpack filter=lfs diff=lfs merge=lfs -text
+*.npy filter=lfs diff=lfs merge=lfs -text
+*.npz filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ot filter=lfs diff=lfs merge=lfs -text
+*.parquet filter=lfs diff=lfs merge=lfs -text
+*.pb filter=lfs diff=lfs merge=lfs -text
+*.pickle filter=lfs diff=lfs merge=lfs -text
+*.pkl filter=lfs diff=lfs merge=lfs -text
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.rar filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+saved_model/**/* filter=lfs diff=lfs merge=lfs -text
+*.tar.* filter=lfs diff=lfs merge=lfs -text
+*.tar filter=lfs diff=lfs merge=lfs -text
+*.tflite filter=lfs diff=lfs merge=lfs -text
+*.tgz filter=lfs diff=lfs merge=lfs -text
+*.wasm filter=lfs diff=lfs merge=lfs -text
+*.xz filter=lfs diff=lfs merge=lfs -text
+*.zip filter=lfs diff=lfs merge=lfs -text
+*.zst filter=lfs diff=lfs merge=lfs -text
+*tfevents* filter=lfs diff=lfs merge=lfs -text

.gitignore CHANGED Viewed

@@ -4,7 +4,6 @@ __pycache__/
 *$py.class
 *.so
 .Python
-env/
 build/
 develop-eggs/
 dist/
@@ -23,6 +22,7 @@ wheels/
 # Virtual Environment
 venv/
 ENV/
 # IDE
@@ -31,33 +31,24 @@ ENV/
 *.swp
 *.swo
-# OS
-.DS_Store
-Thumbs.db
-# Model files
-*.bin
 *.pt
 *.pth
 *.ckpt
-*.safetensors
-# Logs
-*.log
-logs/
-# Frontend
-frontend/
-node_modules/
-npm-debug.log*
-yarn-debug.log*
-yarn-error.log*
 # Temporary files
 *.tmp
 *.temp
 temp/
-tmp/
-# Hugging Face Space
-llava-chat/

 *$py.class
 *.so
 .Python
 build/
 develop-eggs/
 dist/
 # Virtual Environment
 venv/
+env/
 ENV/
 # IDE
 *.swp
 *.swo
+# Project specific
+llava-chat/
+*.log
 *.pt
 *.pth
 *.ckpt
+*.bin
+# Hugging Face specific
+.huggingface/
+wandb/
+runs/
+# OS
+.DS_Store
+Thumbs.db
 # Temporary files
 *.tmp
 *.temp
 temp/

README.md CHANGED Viewed

@@ -1,160 +1,23 @@
-# LLaVA Implementation
-[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
-[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
-[![Gradio](https://img.shields.io/badge/Gradio-4.44.1-orange.svg)](https://gradio.app/)
-[![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Spaces-blue)](https://huggingface.co/spaces/Prashant26am/llava-chat)
-## 📝 About
-This project is an implementation of LLaVA (Large Language and Vision Assistant), a powerful multimodal AI model that combines vision and language understanding. Here's what makes this implementation special:
-### 🎯 Key Features
-- **Multimodal Understanding**
-  - Seamless integration of vision and language models
-  - Real-time image analysis and description
-  - Natural language interaction about visual content
-  - Support for various image types and formats
-- **Model Architecture**
-  - CLIP ViT vision encoder for robust image understanding
-  - TinyLlama language model for efficient text generation
-  - Custom projection layer for vision-language alignment
-  - Memory-optimized for deployment on various platforms
-- **User Interface**
-  - Modern Gradio-based web interface
-  - Real-time image processing
-  - Interactive chat experience
-  - Customizable generation parameters
-  - Responsive design for all devices
-- **Technical Highlights**
-  - CPU-optimized implementation
-  - Memory-efficient model loading
-  - Fast inference with optimized settings
-  - Robust error handling and logging
-  - Easy deployment on Hugging Face Spaces
-### 🛠️ Technology Stack
-- **Core Technologies**
-  - PyTorch for deep learning
-  - Transformers for model architecture
-  - Gradio for web interface
-  - FastAPI for backend services
-  - Hugging Face for model hosting
-- **Development Tools**
-  - Pre-commit hooks for code quality
-  - GitHub Actions for CI/CD
-  - Comprehensive testing suite
-  - Detailed documentation
-  - Development guidelines
-### 🌟 Use Cases
-- **Image Understanding**
-  - Scene description and analysis
-  - Object detection and recognition
-  - Visual question answering
-  - Image-based conversations
-- **Applications**
-  - Educational tools
-  - Content moderation
-  - Visual assistance
-  - Research and development
-  - Creative content generation
-### 🔄 Project Status
-- **Current Version**: 1.0.0
-- **Active Development**: Yes
-- **Production Ready**: Yes
-- **Community Support**: Open for contributions
-### 📊 Performance
-- **Model Size**: Optimized for CPU deployment
-- **Response Time**: Real-time processing
-- **Memory Usage**: Efficient resource utilization
-- **Scalability**: Ready for production deployment
-### 🤝 Community
-- **Contributions**: Open for pull requests
-- **Issues**: Active issue tracking
-- **Documentation**: Comprehensive guides
-- **Support**: Community-driven help
-### 🔮 Future Roadmap
-- [ ] Support for video processing
-- [ ] Additional model variants
-- [ ] Enhanced memory optimization
-- [ ] Extended API capabilities
-- [ ] More interactive features
-### 📚 Resources
-- [Paper](https://arxiv.org/abs/2304.08485)
-- [Documentation](docs/)
-- [API Reference](docs/api/)
-- [Examples](examples/)
-- [Contributing Guide](CONTRIBUTING.md)
-## 🌟 Features
-- **Modern Web Interface**
-  - Beautiful Gradio-based UI
-  - Real-time image analysis
-  - Interactive chat experience
-  - Responsive design
-- **Advanced AI Capabilities**
-  - CLIP ViT-L/14 vision encoder
-  - Vicuna-7B language model
-  - Multimodal understanding
-  - Natural conversation flow
-- **Developer Friendly**
-  - Clean, modular codebase
-  - Comprehensive documentation
-  - Easy deployment options
-  - Extensible architecture
-## 📋 Project Structure
-```
-llava_implementation/
-├── src/                    # Source code
-│   ├── api/               # API endpoints and FastAPI app
-│   ├── models/            # Model implementations
-│   ├── utils/             # Utility functions
-│   └── configs/           # Configuration files
-├── tests/                 # Test suite
-├── docs/                  # Documentation
-│   ├── api/              # API documentation
-│   ├── examples/         # Usage examples
-│   └── guides/           # User and developer guides
-├── assets/               # Static assets
-│   ├── images/          # Example images
-│   └── icons/           # UI icons
-├── scripts/              # Utility scripts
-└── examples/             # Example images for the web interface
-```
-## 🚀 Quick Start
-### Prerequisites
-- Python 3.8+
-- CUDA-capable GPU (recommended)
-- Git
-### Installation
 1. Clone the repository:
 ```bash
@@ -162,96 +25,42 @@ git clone https://github.com/Prashant-ambati/llava-implementation.git
 cd llava-implementation
 ```
-2. Create and activate a virtual environment:
-```bash
-python -m venv venv
-source venv/bin/activate  # On Windows: venv\Scripts\activate
-```
-3. Install dependencies:
 ```bash
 pip install -r requirements.txt
 ```
-### Running Locally
-1. Start the development server:
 ```bash
-python src/api/app.py
-```
-2. Open your browser and navigate to:
-```
-http://localhost:7860
 ```
-## 🌐 Web Deployment
-### Hugging Face Spaces
-The application is deployed on Hugging Face Spaces:
-- [Live Demo](https://huggingface.co/spaces/Prashant26am/llava-chat)
-- Automatic deployment from main branch
-- Free GPU resources
-- Public API access
-### Local Deployment
-For local deployment:
-```bash
-# Build the application
-python -m build
-# Run with production settings
-python src/api/app.py --production
-```
-## 📚 Documentation
-- [API Documentation](docs/api/README.md)
-- [User Guide](docs/guides/user_guide.md)
-- [Developer Guide](docs/guides/developer_guide.md)
-- [Examples](docs/examples/README.md)
-## 🛠️ Development
-### Running Tests
-```bash
-pytest tests/
-```
-### Code Style
-This project follows PEP 8 guidelines. To check your code:
-```bash
-flake8 src/
-black src/
-```
-### Contributing
-1. Fork the repository
-2. Create a feature branch
-3. Commit your changes
-4. Push to the branch
-5. Create a Pull Request
-## 📝 License
 This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
-## 🙏 Acknowledgments
-- [LLaVA Paper](https://arxiv.org/abs/2304.08485) by Microsoft Research
-- [Gradio](https://gradio.app/) for the web interface
-- [Hugging Face](https://huggingface.co/) for model hosting
-- [Vicuna](https://lmsys.org/blog/2023-03-30-vicuna/) for the language model
-- [CLIP](https://openai.com/research/clip) for the vision model
-## 📞 Contact
-- GitHub Issues: [Report a bug](https://github.com/Prashant-ambati/llava-implementation/issues)
-- Email: [Your Email]
-- Twitter: [@YourTwitter]

+# LLaVA Chat
+A lightweight implementation of LLaVA (Large Language and Vision Assistant) optimized for Hugging Face Spaces deployment.
+## Features
+- Efficient model loading with 8-bit quantization
+- Memory-optimized inference
+- FastAPI backend with Gradio interface
+- Support for image understanding and visual conversations
+- Optimized for deployment on Hugging Face Spaces
+## Quick Start
+1. Visit the [Hugging Face Space](https://huggingface.co/spaces/Prashant26am/llava-chat)
+2. Upload an image
+3. Ask questions about the image
+4. Get AI-powered responses
+## Local Development
 1. Clone the repository:
 ```bash
 cd llava-implementation
 ```
+2. Install dependencies:
 ```bash
 pip install -r requirements.txt
 ```
+3. Run the application:
 ```bash
+python llava-chat/app.py
 ```
+## Model Architecture
+- Vision Model: CLIP ViT-Base
+- Language Model: TinyLlama-1.1B-Chat
+- Projection Layer: MLP with configurable hidden dimensions
+## Memory Optimization
+The implementation includes several memory optimization techniques:
+- 8-bit quantization for language model
+- Efficient image processing
+- Gradient checkpointing
+- Memory-efficient attention
+- Automatic mixed precision
+## API Endpoints
+- `POST /process_image`: Process an image with a prompt
+- `GET /status`: Check model and application status
+## License
 This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.
+## Acknowledgments
+- Based on the paper "Visual Instruction Tuning" (NeurIPS 2023)
+- Uses models from Hugging Face Transformers
+- Built with FastAPI and Gradio

app.py ADDED Viewed

	@@ -0,0 +1,376 @@

+from fastapi import FastAPI, HTTPException, Request
+from fastapi.middleware.cors import CORSMiddleware
+from fastapi.responses import JSONResponse
+import os
+import tempfile
+import torch
+import gradio as gr
+import traceback
+import sys
+import logging
+from PIL import Image
+from models.llava import LLaVA
+from typing import Dict, Any, Optional, Union
+# Set up logging
+logging.basicConfig(
+    level=logging.INFO,
+    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
+    handlers=[
+        logging.StreamHandler(sys.stdout),
+        logging.FileHandler('app.log')
+    ]
+)
+logger = logging.getLogger(__name__)
+# Initialize FastAPI app
+app = FastAPI(title="LLaVA Web Interface")
+# Configure CORS
+app.add_middleware(
+    CORSMiddleware,
+    allow_origins=["*"],
+    allow_credentials=True,
+    allow_methods=["*"],
+    allow_headers=["*"],
+)
+# Global state
+model = None
+model_status: Dict[str, Any] = {
+    "initialized": False,
+    "device": None,
+    "error": None,
+    "last_error": None
+}
+@app.exception_handler(Exception)
+async def global_exception_handler(request: Request, exc: Exception):
+    """Global exception handler to catch and log all unhandled exceptions."""
+    error_msg = f"Unhandled error: {str(exc)}\n{traceback.format_exc()}"
+    logger.error(error_msg)
+    model_status["last_error"] = error_msg
+    return JSONResponse(
+        status_code=500,
+        content={"error": "Internal Server Error", "details": str(exc)}
+    )
+@app.get("/status")
+async def get_status():
+    """Endpoint to check model and application status."""
+    return {
+        "model_initialized": model is not None,
+        "model_status": model_status,
+        "memory_usage": {
+            "cuda_available": torch.cuda.is_available(),
+            "cuda_memory_allocated": torch.cuda.memory_allocated() if torch.cuda.is_available() else 0,
+            "cuda_memory_reserved": torch.cuda.memory_reserved() if torch.cuda.is_available() else 0
+        }
+    }
+def initialize_model():
+    """Initialize the LLaVA model with proper error handling."""
+    global model, model_status
+    try:
+        logger.info("Starting model initialization...")
+        model_status["initialized"] = False
+        model_status["error"] = None
+        # Clear any existing model and memory
+        if model is not None:
+            del model
+            torch.cuda.empty_cache()
+        # Initialize new model
+        model = LLaVA(
+            vision_model_path="openai/clip-vit-base-patch32",
+            language_model_path="TinyLlama/TinyLlama-1.1B-Chat-v1.0",
+            device="cpu",
+            projection_hidden_dim=2048
+        )
+        # Configure model for inference
+        if hasattr(model, 'language_model'):
+            model.language_model.config.use_cache = False
+            model.language_model.eval()
+        model_status.update({
+            "initialized": True,
+            "device": str(model.device),
+            "error": None
+        })
+        logger.info(f"Model successfully initialized on {model.device}")
+        return True
+    except Exception as e:
+        error_msg = f"Model initialization failed: {str(e)}"
+        logger.error(error_msg)
+        logger.error(traceback.format_exc())
+        model = None
+        model_status.update({
+            "initialized": False,
+            "error": error_msg,
+            "last_error": traceback.format_exc()
+        })
+        return False
+def process_image(
+    image: Optional[Image.Image],
+    prompt: str,
+    max_new_tokens: int = 256,
+    temperature: float = 0.7,
+    top_p: float = 0.9
+) -> str:
+    """Process an image with the LLaVA model with comprehensive error handling."""
+    global model_status
+    logger.info("Starting image processing...")
+    # Validate model state
+    if model is None:
+        logger.error("Model not initialized")
+        if not initialize_model():
+            model_status["last_error"] = "Model initialization failed during processing"
+            return "Error: Model initialization failed. Please try again later."
+    # Validate inputs
+    if image is None:
+        logger.error("No image provided")
+        return "Error: Please upload an image first."
+    if not isinstance(image, Image.Image):
+        logger.error(f"Invalid image type: {type(image)}")
+        return "Error: Invalid image format. Please upload a valid image."
+    if not prompt or not isinstance(prompt, str) or not prompt.strip():
+        logger.error("Invalid prompt")
+        return "Error: Please enter a valid prompt."
+    # Validate parameters
+    try:
+        max_new_tokens = int(max_new_tokens)
+        temperature = float(temperature)
+        top_p = float(top_p)
+    except (ValueError, TypeError) as e:
+        logger.error(f"Invalid parameters: {str(e)}")
+        return "Error: Invalid generation parameters."
+    temp_path = None
+    try:
+        logger.info(f"Processing image with prompt: {prompt[:100]}...")
+        # Save image with explicit format
+        with tempfile.NamedTemporaryFile(delete=False, suffix='.png') as temp_file:
+            image.save(temp_file.name, format='PNG')
+            temp_path = temp_file.name
+            logger.info(f"Saved temporary image to {temp_path}")
+        # Clear memory
+        torch.cuda.empty_cache()
+        # Process image
+        with torch.inference_mode():
+            try:
+                logger.info("Generating response...")
+                response = model.generate_from_image(
+                    image_path=temp_path,
+                    prompt=prompt,
+                    max_new_tokens=max_new_tokens,
+                    temperature=temperature,
+                    top_p=top_p
+                )
+                if not response:
+                    raise ValueError("Empty response from model")
+                if not isinstance(response, str):
+                    raise ValueError(f"Invalid response type: {type(response)}")
+                logger.info("Successfully generated response")
+                model_status["last_error"] = None
+                return response
+            except Exception as model_error:
+                error_msg = f"Model inference error: {str(model_error)}"
+                logger.error(error_msg)
+                logger.error(traceback.format_exc())
+                model_status["last_error"] = error_msg
+                return f"Error during model inference: {str(model_error)}"
+    except Exception as e:
+        error_msg = f"Processing error: {str(e)}"
+        logger.error(error_msg)
+        logger.error(traceback.format_exc())
+        model_status["last_error"] = error_msg
+        return f"Error processing image: {str(e)}"
+    finally:
+        # Cleanup
+        if temp_path and os.path.exists(temp_path):
+            try:
+                os.unlink(temp_path)
+                logger.info("Cleaned up temporary file")
+            except Exception as e:
+                logger.warning(f"Failed to clean up temporary file: {str(e)}")
+        try:
+            torch.cuda.empty_cache()
+        except Exception as e:
+            logger.warning(f"Failed to clear CUDA cache: {str(e)}")
+def get_status_text() -> str:
+    """Get a formatted status text for display."""
+    try:
+        status = {
+            "Model Initialized": "Yes" if model is not None else "No",
+            "Device": str(model.device) if model is not None else "None",
+            "Last Error": model_status.get("last_error", "None"),
+            "Memory Usage": {
+                "CUDA Available": "Yes" if torch.cuda.is_available() else "No",
+                "Memory Allocated": f"{torch.cuda.memory_allocated() / 1024**2:.2f} MB" if torch.cuda.is_available() else "N/A",
+                "Memory Reserved": f"{torch.cuda.memory_reserved() / 1024**2:.2f} MB" if torch.cuda.is_available() else "N/A"
+            }
+        }
+        return "\n".join(f"{k}: {v}" for k, v in status.items())
+    except Exception as e:
+        return f"Error getting status: {str(e)}"
+def create_interface():
+    """Create the Gradio interface with proper error handling."""
+    try:
+        with gr.Blocks(title="LLaVA Chat", theme=gr.themes.Soft()) as demo:
+            gr.Markdown("""
+            # LLaVA Chat
+            Upload an image and chat with LLaVA about it. This model can understand and describe images, answer questions about them, and engage in visual conversations.
+            ## Example Prompts
+            Try these prompts to get started:
+            - "What can you see in this image?"
+            - "Describe this scene in detail"
+            - "What emotions does this image convey?"
+            - "What's happening in this picture?"
+            - "Can you identify any objects or people in this image?"
+            """)
+            with gr.Row():
+                with gr.Column(scale=1):
+                    # Input components with explicit types and validation
+                    image_input = gr.Image(
+                        type="pil",
+                        label="Upload Image",
+                        image_mode="RGB",
+                        format="PNG"
+                    )
+                    prompt_input = gr.Textbox(
+                        label="Ask about the image",
+                        placeholder="What can you see in this image?",
+                        lines=3,
+                        max_lines=5
+                    )
+                    with gr.Accordion("Advanced Settings", open=False):
+                        max_tokens = gr.Slider(
+                            minimum=32,
+                            maximum=512,
+                            value=256,
+                            step=32,
+                            label="Max New Tokens"
+                        )
+                        temperature = gr.Slider(
+                            minimum=0.1,
+                            maximum=1.0,
+                            value=0.7,
+                            step=0.1,
+                            label="Temperature"
+                        )
+                        top_p = gr.Slider(
+                            minimum=0.1,
+                            maximum=1.0,
+                            value=0.9,
+                            step=0.1,
+                            label="Top P"
+                        )
+                    submit_btn = gr.Button("Generate Response", variant="primary")
+                    status_btn = gr.Button("Check Status", variant="secondary")
+                with gr.Column(scale=1):
+                    output = gr.Textbox(
+                        label="Model Response",
+                        lines=10,
+                        show_copy_button=True
+                    )
+                    status_output = gr.Textbox(
+                        label="System Status",
+                        lines=5,
+                        show_copy_button=True
+                    )
+            # Set up event handlers with proper error handling
+            def safe_process_image(*args):
+                try:
+                    return process_image(*args)
+                except Exception as e:
+                    logger.error(f"Interface error: {str(e)}")
+                    logger.error(traceback.format_exc())
+                    return f"Error: {str(e)}"
+            submit_btn.click(
+                fn=safe_process_image,
+                inputs=[
+                    image_input,
+                    prompt_input,
+                    max_tokens,
+                    temperature,
+                    top_p
+                ],
+                outputs=output,
+                api_name="process_image"
+            )
+            status_btn.click(
+                fn=get_status_text,
+                inputs=[],
+                outputs=status_output,
+                api_name="check_status"
+            )
+        logger.info("Successfully created Gradio interface")
+        return demo
+    except Exception as e:
+        logger.error(f"Failed to create interface: {str(e)}")
+        logger.error(traceback.format_exc())
+        raise
+# Create and mount Gradio app
+try:
+    logger.info("Creating Gradio interface...")
+    demo = create_interface()
+    app = gr.mount_gradio_app(app, demo, path="/")
+    logger.info("Successfully mounted Gradio app")
+except Exception as e:
+    logger.error(f"Failed to mount Gradio app: {str(e)}")
+    logger.error(traceback.format_exc())
+    raise
+if __name__ == "__main__":
+    try:
+        # Initialize model
+        logger.info("Starting application...")
+        if not initialize_model():
+            logger.error("Model initialization failed. Exiting...")
+            sys.exit(1)
+        # Start server
+        import uvicorn
+        logger.info("Starting server...")
+        uvicorn.run(
+            app,
+            host="0.0.0.0",
+            port=7860,
+            log_level="info"
+        )
+    except Exception as e:
+        logger.error(f"Application startup failed: {str(e)}")
+        logger.error(traceback.format_exc())
+        sys.exit(1)

frontend/.gitignore ADDED Viewed

	@@ -0,0 +1,23 @@

+# See https://help.github.com/articles/ignoring-files/ for more about ignoring files.
+# dependencies
+/node_modules
+/.pnp
+.pnp.js
+# testing
+/coverage
+# production
+/build
+# misc
+.DS_Store
+.env.local
+.env.development.local
+.env.test.local
+.env.production.local
+npm-debug.log*
+yarn-debug.log*
+yarn-error.log*

frontend/README.md ADDED Viewed

	@@ -0,0 +1,46 @@

+# Getting Started with Create React App
+This project was bootstrapped with [Create React App](https://github.com/facebook/create-react-app).
+## Available Scripts
+In the project directory, you can run:
+### `npm start`
+Runs the app in the development mode.\
+Open [http://localhost:3000](http://localhost:3000) to view it in the browser.
+The page will reload if you make edits.\
+You will also see any lint errors in the console.
+### `npm test`
+Launches the test runner in the interactive watch mode.\
+See the section about [running tests](https://facebook.github.io/create-react-app/docs/running-tests) for more information.
+### `npm run build`
+Builds the app for production to the `build` folder.\
+It correctly bundles React in production mode and optimizes the build for the best performance.
+The build is minified and the filenames include the hashes.\
+Your app is ready to be deployed!
+See the section about [deployment](https://facebook.github.io/create-react-app/docs/deployment) for more information.
+### `npm run eject`
+**Note: this is a one-way operation. Once you `eject`, you can’t go back!**
+If you aren’t satisfied with the build tool and configuration choices, you can `eject` at any time. This command will remove the single build dependency from your project.
+Instead, it will copy all the configuration files and the transitive dependencies (webpack, Babel, ESLint, etc) right into your project so you have full control over them. All of the commands except `eject` will still work, but they will point to the copied scripts so you can tweak them. At this point you’re on your own.
+You don’t have to ever use `eject`. The curated feature set is suitable for small and middle deployments, and you shouldn’t feel obligated to use this feature. However we understand that this tool wouldn’t be useful if you couldn’t customize it when you are ready for it.
+## Learn More
+You can learn more in the [Create React App documentation](https://facebook.github.io/create-react-app/docs/getting-started).
+To learn React, check out the [React documentation](https://reactjs.org/).

frontend/package-lock.json ADDED Viewed

The diff for this file is too large to render. See raw diff

frontend/package.json ADDED Viewed

	@@ -0,0 +1,52 @@

+{
+  "name": "frontend",
+  "version": "0.1.0",
+  "private": true,
+  "dependencies": {
+    "@headlessui/react": "^2.2.4",
+    "@heroicons/react": "^2.2.0",
+    "@tailwindcss/forms": "^0.5.10",
+    "@testing-library/dom": "^10.4.0",
+    "@testing-library/jest-dom": "^6.6.3",
+    "@testing-library/react": "^16.3.0",
+    "@testing-library/user-event": "^13.5.0",
+    "@types/jest": "^27.5.2",
+    "@types/node": "^16.18.126",
+    "@types/react": "^19.1.5",
+    "@types/react-dom": "^19.1.5",
+    "autoprefixer": "^10.4.21",
+    "axios": "^1.9.0",
+    "postcss": "^8.5.3",
+    "react": "^19.1.0",
+    "react-dom": "^19.1.0",
+    "react-dropzone": "^14.3.8",
+    "react-scripts": "5.0.1",
+    "tailwindcss": "^4.1.7",
+    "typescript": "^4.9.5",
+    "web-vitals": "^2.1.4"
+  },
+  "scripts": {
+    "start": "react-scripts start",
+    "build": "react-scripts build",
+    "test": "react-scripts test",
+    "eject": "react-scripts eject"
+  },
+  "eslintConfig": {
+    "extends": [
+      "react-app",
+      "react-app/jest"
+    ]
+  },
+  "browserslist": {
+    "production": [
+      ">0.2%",
+      "not dead",
+      "not op_mini all"
+    ],
+    "development": [
+      "last 1 chrome version",
+      "last 1 firefox version",
+      "last 1 safari version"
+    ]
+  }
+}

frontend/public/favicon.ico ADDED Viewed

frontend/public/index.html ADDED Viewed

	@@ -0,0 +1,43 @@

+<!DOCTYPE html>
+<html lang="en">
+  <head>
+    <meta charset="utf-8" />
+    <link rel="icon" href="%PUBLIC_URL%/favicon.ico" />
+    <meta name="viewport" content="width=device-width, initial-scale=1" />
+    <meta name="theme-color" content="#000000" />
+    <meta
+      name="description"
+      content="Web site created using create-react-app"
+    />
+    <link rel="apple-touch-icon" href="%PUBLIC_URL%/logo192.png" />
+    <!--
+      manifest.json provides metadata used when your web app is installed on a
+      user's mobile device or desktop. See https://developers.google.com/web/fundamentals/web-app-manifest/
+    -->
+    <link rel="manifest" href="%PUBLIC_URL%/manifest.json" />
+    <!--
+      Notice the use of %PUBLIC_URL% in the tags above.
+      It will be replaced with the URL of the `public` folder during the build.
+      Only files inside the `public` folder can be referenced from the HTML.
+      Unlike "/favicon.ico" or "favicon.ico", "%PUBLIC_URL%/favicon.ico" will
+      work correctly both with client-side routing and a non-root public URL.
+      Learn how to configure a non-root public URL by running `npm run build`.
+    -->
+    <title>React App</title>
+  </head>
+  <body>
+    <noscript>You need to enable JavaScript to run this app.</noscript>
+    <div id="root"></div>
+    <!--
+      This HTML file is a template.
+      If you open it directly in the browser, you will see an empty page.
+      You can add webfonts, meta tags, or analytics to this file.
+      The build step will place the bundled scripts into the <body> tag.
+      To begin the development, run `npm start` or `yarn start`.
+      To create a production bundle, use `npm run build` or `yarn build`.
+    -->
+  </body>
+</html>

frontend/public/logo192.png ADDED Viewed

frontend/public/logo512.png ADDED Viewed

frontend/public/manifest.json ADDED Viewed

	@@ -0,0 +1,25 @@

+{
+  "short_name": "React App",
+  "name": "Create React App Sample",
+  "icons": [
+    {
+      "src": "favicon.ico",
+      "sizes": "64x64 32x32 24x24 16x16",
+      "type": "image/x-icon"
+    },
+    {
+      "src": "logo192.png",
+      "type": "image/png",
+      "sizes": "192x192"
+    },
+    {
+      "src": "logo512.png",
+      "type": "image/png",
+      "sizes": "512x512"
+    }
+  ],
+  "start_url": ".",
+  "display": "standalone",
+  "theme_color": "#000000",
+  "background_color": "#ffffff"
+}

frontend/public/robots.txt ADDED Viewed

	@@ -0,0 +1,3 @@

+# https://www.robotstxt.org/robotstxt.html
+User-agent: *
+Disallow:

frontend/src/App.css ADDED Viewed

	@@ -0,0 +1,38 @@

+.App {
+  text-align: center;
+}
+.App-logo {
+  height: 40vmin;
+  pointer-events: none;
+}
+@media (prefers-reduced-motion: no-preference) {
+  .App-logo {
+    animation: App-logo-spin infinite 20s linear;
+  }
+}
+.App-header {
+  background-color: #282c34;
+  min-height: 100vh;
+  display: flex;
+  flex-direction: column;
+  align-items: center;
+  justify-content: center;
+  font-size: calc(10px + 2vmin);
+  color: white;
+}
+.App-link {
+  color: #61dafb;
+}
+@keyframes App-logo-spin {
+  from {
+    transform: rotate(0deg);
+  }
+  to {
+    transform: rotate(360deg);
+  }
+}

frontend/src/App.test.tsx ADDED Viewed

	@@ -0,0 +1,9 @@

+import React from 'react';
+import { render, screen } from '@testing-library/react';
+import App from './App';
+test('renders learn react link', () => {
+  render(<App />);
+  const linkElement = screen.getByText(/learn react/i);
+  expect(linkElement).toBeInTheDocument();
+});

frontend/src/App.tsx ADDED Viewed

	@@ -0,0 +1,177 @@

+import React, { useState, useCallback } from 'react';
+import { useDropzone } from 'react-dropzone';
+import axios from 'axios';
+import { ChatBubbleLeftIcon, PhotoIcon, ArrowUpTrayIcon } from '@heroicons/react/24/outline';
+interface Message {
+  type: 'user' | 'assistant';
+  content: string;
+  imageUrl?: string;
+}
+function App() {
+  const [messages, setMessages] = useState<Message[]>([]);
+  const [prompt, setPrompt] = useState('');
+  const [isLoading, setIsLoading] = useState(false);
+  const [selectedImage, setSelectedImage] = useState<File | null>(null);
+  const [previewUrl, setPreviewUrl] = useState<string | null>(null);
+  const onDrop = useCallback((acceptedFiles: File[]) => {
+    const file = acceptedFiles[0];
+    if (file) {
+      setSelectedImage(file);
+      const url = URL.createObjectURL(file);
+      setPreviewUrl(url);
+    }
+  }, []);
+  const { getRootProps, getInputProps, isDragActive } = useDropzone({
+    onDrop,
+    accept: {
+      'image/*': ['.png', '.jpg', '.jpeg', '.gif']
+    },
+    maxFiles: 1
+  });
+  const handleSubmit = async (e: React.FormEvent) => {
+    e.preventDefault();
+    if (!selectedImage || !prompt.trim()) return;
+    setIsLoading(true);
+    const formData = new FormData();
+    formData.append('file', selectedImage);
+    formData.append('prompt', prompt);
+    // Add user message
+    setMessages(prev => [...prev, {
+      type: 'user',
+      content: prompt,
+      imageUrl: previewUrl || undefined
+    }]);
+    try {
+      const response = await axios.post('http://localhost:8000/api/chat', formData, {
+        headers: {
+          'Content-Type': 'multipart/form-data',
+        },
+      });
+      // Add assistant message
+      setMessages(prev => [...prev, {
+        type: 'assistant',
+        content: response.data.response
+      }]);
+      // Clear input
+      setPrompt('');
+      setSelectedImage(null);
+      setPreviewUrl(null);
+    } catch (error) {
+      console.error('Error:', error);
+      // Add error message
+      setMessages(prev => [...prev, {
+        type: 'assistant',
+        content: 'Sorry, there was an error processing your request.'
+      }]);
+    } finally {
+      setIsLoading(false);
+    }
+  };
+  return (
+    <div className="min-h-screen bg-gray-100">
+      <div className="max-w-4xl mx-auto p-4">
+        <header className="text-center py-8">
+          <h1 className="text-4xl font-bold text-primary-600">LLaVA Chat</h1>
+          <p className="text-gray-600 mt-2">Upload an image and chat with LLaVA about it</p>
+        </header>
+        <div className="bg-white rounded-lg shadow-lg p-4 mb-4">
+          <div className="space-y-4">
+            {messages.map((message, index) => (
+              <div
+                key={index}
+                className={`flex ${message.type === 'user' ? 'justify-end' : 'justify-start'}`}
+              >
+                <div
+                  className={`max-w-[80%] rounded-lg p-4 ${
+                    message.type === 'user'
+                      ? 'bg-primary-600 text-white'
+                      : 'bg-gray-100 text-gray-800'
+                  }`}
+                >
+                  {message.imageUrl && (
+                    <img
+                      src={message.imageUrl}
+                      alt="Uploaded"
+                      className="w-48 h-48 object-cover rounded-lg mb-2"
+                    />
+                  )}
+                  <p className="whitespace-pre-wrap">{message.content}</p>
+                </div>
+              </div>
+            ))}
+          </div>
+        </div>
+        <form onSubmit={handleSubmit} className="bg-white rounded-lg shadow-lg p-4">
+          {!selectedImage ? (
+            <div
+              {...getRootProps()}
+              className={`border-2 border-dashed rounded-lg p-8 text-center cursor-pointer transition-colors
+                ${isDragActive ? 'border-primary-500 bg-primary-50' : 'border-gray-300 hover:border-primary-500'}`}
+            >
+              <input {...getInputProps()} />
+              <PhotoIcon className="mx-auto h-12 w-12 text-gray-400" />
+              <p className="mt-2 text-sm text-gray-600">
+                Drag and drop an image here, or click to select
+              </p>
+            </div>
+          ) : (
+            <div className="relative">
+              <img
+                src={previewUrl || ''}
+                alt="Preview"
+                className="w-full h-48 object-cover rounded-lg"
+              />
+              <button
+                type="button"
+                onClick={() => {
+                  setSelectedImage(null);
+                  setPreviewUrl(null);
+                }}
+                className="absolute top-2 right-2 bg-red-500 text-white p-1 rounded-full hover:bg-red-600"
+              >
+                ×
+              </button>
+            </div>
+          )}
+          <div className="mt-4 flex space-x-4">
+            <input
+              type="text"
+              value={prompt}
+              onChange={(e) => setPrompt(e.target.value)}
+              placeholder="Ask about the image..."
+              className="input-primary flex-1"
+              disabled={!selectedImage || isLoading}
+            />
+            <button
+              type="submit"
+              disabled={!selectedImage || !prompt.trim() || isLoading}
+              className="btn-primary disabled:opacity-50 disabled:cursor-not-allowed"
+            >
+              {isLoading ? (
+                <div className="w-6 h-6 border-2 border-white border-t-transparent rounded-full animate-spin" />
+              ) : (
+                <ArrowUpTrayIcon className="h-6 w-6" />
+              )}
+            </button>
+          </div>
+        </form>
+      </div>
+    </div>
+  );
+}
+export default App;

frontend/src/index.css ADDED Viewed

	@@ -0,0 +1,27 @@

+@tailwind base;
+@tailwind components;
+@tailwind utilities;
+@layer components {
+  .btn-primary {
+    @apply px-4 py-2 bg-primary-600 text-white rounded-md hover:bg-primary-700 focus:outline-none focus:ring-2 focus:ring-primary-500 focus:ring-offset-2;
+  }
+  .input-primary {
+    @apply block w-full rounded-md border-gray-300 shadow-sm focus:border-primary-500 focus:ring-primary-500;
+  }
+}
+body {
+  margin: 0;
+  font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', 'Roboto', 'Oxygen',
+    'Ubuntu', 'Cantarell', 'Fira Sans', 'Droid Sans', 'Helvetica Neue',
+    sans-serif;
+  -webkit-font-smoothing: antialiased;
+  -moz-osx-font-smoothing: grayscale;
+}
+code {
+  font-family: source-code-pro, Menlo, Monaco, Consolas, 'Courier New',
+    monospace;
+}

frontend/src/index.tsx ADDED Viewed

	@@ -0,0 +1,19 @@

+import React from 'react';
+import ReactDOM from 'react-dom/client';
+import './index.css';
+import App from './App';
+import reportWebVitals from './reportWebVitals';
+const root = ReactDOM.createRoot(
+  document.getElementById('root') as HTMLElement
+);
+root.render(
+  <React.StrictMode>
+    <App />
+  </React.StrictMode>
+);
+// If you want to start measuring performance in your app, pass a function
+// to log results (for example: reportWebVitals(console.log))
+// or send to an analytics endpoint. Learn more: https://bit.ly/CRA-vitals
+reportWebVitals();

frontend/src/logo.svg ADDED Viewed

frontend/src/react-app-env.d.ts ADDED Viewed

	@@ -0,0 +1 @@


1	+ /// <reference types="react-scripts" />

frontend/src/reportWebVitals.ts ADDED Viewed

	@@ -0,0 +1,15 @@

+import { ReportHandler } from 'web-vitals';
+const reportWebVitals = (onPerfEntry?: ReportHandler) => {
+  if (onPerfEntry && onPerfEntry instanceof Function) {
+    import('web-vitals').then(({ getCLS, getFID, getFCP, getLCP, getTTFB }) => {
+      getCLS(onPerfEntry);
+      getFID(onPerfEntry);
+      getFCP(onPerfEntry);
+      getLCP(onPerfEntry);
+      getTTFB(onPerfEntry);
+    });
+  }
+};
+export default reportWebVitals;

frontend/src/setupTests.ts ADDED Viewed

	@@ -0,0 +1,5 @@

+// jest-dom adds custom jest matchers for asserting on DOM nodes.
+// allows you to do things like:
+// expect(element).toHaveTextContent(/react/i)
+// learn more: https://github.com/testing-library/jest-dom
+import '@testing-library/jest-dom';

frontend/tailwind.config.js ADDED Viewed

	@@ -0,0 +1,27 @@

+/** @type {import('tailwindcss').Config} */
+module.exports = {
+  content: [
+    "./src/**/*.{js,jsx,ts,tsx}",
+  ],
+  theme: {
+    extend: {
+      colors: {
+        primary: {
+          50: '#f0f9ff',
+          100: '#e0f2fe',
+          200: '#bae6fd',
+          300: '#7dd3fc',
+          400: '#38bdf8',
+          500: '#0ea5e9',
+          600: '#0284c7',
+          700: '#0369a1',
+          800: '#075985',
+          900: '#0c4a6e',
+        },
+      },
+    },
+  },
+  plugins: [
+    require('@tailwindcss/forms'),
+  ],
+}

frontend/tsconfig.json ADDED Viewed

	@@ -0,0 +1,26 @@

+{
+  "compilerOptions": {
+    "target": "es5",
+    "lib": [
+      "dom",
+      "dom.iterable",
+      "esnext"
+    ],
+    "allowJs": true,
+    "skipLibCheck": true,
+    "esModuleInterop": true,
+    "allowSyntheticDefaultImports": true,
+    "strict": true,
+    "forceConsistentCasingInFileNames": true,
+    "noFallthroughCasesInSwitch": true,
+    "module": "esnext",
+    "moduleResolution": "node",
+    "resolveJsonModule": true,
+    "isolatedModules": true,
+    "noEmit": true,
+    "jsx": "react-jsx"
+  },
+  "include": [
+    "src"
+  ]
+}

requirements.txt CHANGED Viewed

@@ -1,8 +1,13 @@
-torch>=2.0.0
-torchvision>=0.15.0
 transformers>=4.36.0
-accelerate>=0.25.0
 pillow>=10.0.0
 numpy>=1.24.0
 tqdm>=4.65.0
 matplotlib>=3.7.0
@@ -11,14 +16,12 @@ einops>=0.7.0
 timm>=0.9.0
 sentencepiece>=0.1.99
 peft>=0.7.0
-safetensors>=0.4.0
-gradio==4.44.1
-fastapi>=0.109.0
-uvicorn>=0.27.0
 python-multipart>=0.0.6
 pydantic>=2.5.0
 python-jose>=3.3.0
 passlib>=1.7.4
 bcrypt>=4.0.1
 aiofiles>=23.2.0
-httpx>=0.26.0

 transformers>=4.36.0
+torch>=2.1.0
 pillow>=10.0.0
+gradio>=4.0.0
+fastapi>=0.100.0
+uvicorn>=0.23.0
+accelerate>=0.25.0
+bitsandbytes>=0.41.0  # For 8-bit quantization
+safetensors>=0.4.0  # For safe model loading
+torchvision>=0.15.0
 numpy>=1.24.0
 tqdm>=4.65.0
 matplotlib>=3.7.0
 timm>=0.9.0
 sentencepiece>=0.1.99
 peft>=0.7.0
 python-multipart>=0.0.6
 pydantic>=2.5.0
 python-jose>=3.3.0
 passlib>=1.7.4
 bcrypt>=4.0.1
 aiofiles>=23.2.0
+httpx>=0.26.0
+# Memory optimization
+optimum>=1.16.0

src/models/main.py CHANGED Viewed

@@ -53,9 +53,7 @@ def main():
     model = LLaVA(
         vision_model_path=args.vision_model,
         language_model_path=args.language_model,
-        device=args.device,
-        load_in_8bit=args.load_8bit,
-        load_in_4bit=args.load_4bit
     )
     print(f"Model initialized on {model.device}")

     model = LLaVA(
         vision_model_path=args.vision_model,
         language_model_path=args.language_model,
+        device=args.device
     )
     print(f"Model initialized on {model.device}")