Spaces:
Sleeping
Sleeping
doc: Updated DEVELOPER.md to reflect current codebase structure
Browse files- docs/DEVELOPER.md +127 -49
docs/DEVELOPER.md
CHANGED
|
@@ -4,86 +4,164 @@
|
|
| 4 |
|
| 5 |
```
|
| 6 |
.
|
| 7 |
-
βββ app.py
|
| 8 |
-
βββ app_simple_rag.py
|
| 9 |
-
βββ
|
| 10 |
-
βββ
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
β
|
| 15 |
-
β
|
| 16 |
-
β
|
| 17 |
-
β
|
| 18 |
-
βββ
|
| 19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 20 |
```
|
| 21 |
|
| 22 |
## π§© Dependency Structure
|
| 23 |
|
| 24 |
-
Dependencies are organized into logical groups
|
| 25 |
|
| 26 |
-
|
| 27 |
-
|
| 28 |
-
- **
|
| 29 |
-
- **
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 30 |
|
| 31 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
| 32 |
```bash
|
| 33 |
-
pip install -e
|
|
|
|
|
|
|
| 34 |
```
|
| 35 |
|
| 36 |
-
## π§ Technical
|
| 37 |
-
|
| 38 |
-
The application uses LangChain, LangGraph, and Chainlit to create an agentic RAG system:
|
| 39 |
|
| 40 |
### Key Components
|
| 41 |
|
| 42 |
-
|
| 43 |
-
-
|
| 44 |
-
-
|
| 45 |
-
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
| 49 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 50 |
```bash
|
| 51 |
-
|
| 52 |
-
source venv/bin/activate # On Windows: venv\Scripts\activate
|
| 53 |
```
|
|
|
|
| 54 |
|
| 55 |
-
|
| 56 |
```bash
|
| 57 |
-
|
| 58 |
```
|
|
|
|
|
|
|
|
|
|
| 59 |
|
| 60 |
-
|
| 61 |
```bash
|
| 62 |
-
|
| 63 |
-
|
|
|
|
| 64 |
```
|
| 65 |
|
| 66 |
-
|
| 67 |
```bash
|
| 68 |
-
|
|
|
|
| 69 |
```
|
| 70 |
|
| 71 |
-
|
| 72 |
-
|
| 73 |
-
To check for dependency issues:
|
| 74 |
```bash
|
|
|
|
| 75 |
deptry .
|
| 76 |
-
```
|
| 77 |
|
| 78 |
-
|
| 79 |
-
```bash
|
| 80 |
black .
|
| 81 |
ruff check .
|
| 82 |
mypy .
|
|
|
|
|
|
|
|
|
|
| 83 |
```
|
| 84 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 85 |
## π Resources
|
| 86 |
|
| 87 |
-
- [Chainlit Documentation](https://docs.chainlit.io)
|
| 88 |
-
- [LangChain Documentation](https://python.langchain.com/docs/get_started/introduction)
|
| 89 |
-
- [LangGraph Documentation](https://langchain-ai.github.io/langgraph/)
|
|
|
|
|
|
|
|
|
| 4 |
|
| 5 |
```
|
| 6 |
.
|
| 7 |
+
βββ app.py # Main Chainlit application (multi-agent RAG)
|
| 8 |
+
βββ app_simple_rag.py # Simplified single-agent RAG application
|
| 9 |
+
βββ Dockerfile # Docker container configuration
|
| 10 |
+
βββ pyproject.toml # Project configuration and dependencies
|
| 11 |
+
βββ requirements.txt # Basic requirements (for legacy compatibility)
|
| 12 |
+
βββ uv.lock # Lock file for uv package manager
|
| 13 |
+
βββ pstuts_rag/ # Package directory
|
| 14 |
+
β βββ pstuts_rag/ # Source code
|
| 15 |
+
β β βββ __init__.py # Package initialization
|
| 16 |
+
β β βββ configuration.py # Application configuration settings
|
| 17 |
+
β β βββ datastore.py # Vector database and document management
|
| 18 |
+
β β βββ rag.py # RAG chain implementation and factories
|
| 19 |
+
β β βββ graph.py # LangGraph multi-agent implementation
|
| 20 |
+
β β βββ state.py # Team state management for agents
|
| 21 |
+
β β βββ prompts.py # System prompts for different agents
|
| 22 |
+
β β βββ evaluator_utils.py # RAG evaluation utilities
|
| 23 |
+
β β βββ utils.py # General utilities
|
| 24 |
+
β βββ setup.py # Package setup (legacy)
|
| 25 |
+
β βββ CERT_SUBMISSION.md # Certification submission documentation
|
| 26 |
+
βββ data/ # Dataset files (JSON format)
|
| 27 |
+
β βββ train.json # Training dataset
|
| 28 |
+
β βββ dev.json # Development dataset
|
| 29 |
+
β βββ test.json # Test dataset
|
| 30 |
+
β βββ kg_*.json # Knowledge graph datasets
|
| 31 |
+
β βββ LICENSE.txt # Dataset license
|
| 32 |
+
β βββ README.md # Dataset documentation
|
| 33 |
+
βββ notebooks/ # Jupyter notebooks for development
|
| 34 |
+
β βββ evaluate_rag.ipynb # RAG evaluation notebook
|
| 35 |
+
β βββ transcript_rag.ipynb # Basic RAG experiments
|
| 36 |
+
β βββ transcript_agents.ipynb # Multi-agent experiments
|
| 37 |
+
β βββ Fine_Tuning_Embedding_for_PSTuts.ipynb # Embedding fine-tuning
|
| 38 |
+
β βββ */ # Fine-tuned model checkpoints
|
| 39 |
+
βββ docs/ # Documentation
|
| 40 |
+
β βββ DEVELOPER.md # This file - developer documentation
|
| 41 |
+
β βββ ANSWER.md # Technical answer documentation
|
| 42 |
+
β βββ BLOGPOST*.md # Blog post drafts
|
| 43 |
+
β βββ dataset_card.md # Dataset card documentation
|
| 44 |
+
β βββ TODO.md # Development TODO list
|
| 45 |
+
β βββ chainlit.md # Chainlit welcome message
|
| 46 |
+
βββ scripts/ # Utility scripts (currently empty)
|
| 47 |
+
βββ README.md # User-facing documentation
|
| 48 |
```
|
| 49 |
|
| 50 |
## π§© Dependency Structure
|
| 51 |
|
| 52 |
+
Dependencies are organized into logical groups in `pyproject.toml`:
|
| 53 |
|
| 54 |
+
### Core Dependencies π―
|
| 55 |
+
All required dependencies for the RAG system including:
|
| 56 |
+
- **LangChain ecosystem**: `langchain`, `langchain-core`, `langchain-community`, `langchain-openai`, `langgraph`
|
| 57 |
+
- **Vector database**: `qdrant-client`, `langchain-qdrant`
|
| 58 |
+
- **ML/AI libraries**: `sentence-transformers`, `transformers`, `torch`
|
| 59 |
+
- **Web interface**: `chainlit==2.0.4`
|
| 60 |
+
- **Data processing**: `pandas`, `datasets`, `pyarrow`
|
| 61 |
+
- **Evaluation**: `ragas==0.2.15`
|
| 62 |
+
- **Jupyter support**: `ipykernel`, `jupyter`, `ipywidgets`
|
| 63 |
+
- **API integration**: `tavily-python` (web search), `requests`, `python-dotenv`
|
| 64 |
|
| 65 |
+
### Optional Dependencies π§
|
| 66 |
+
- **dev**: Development tools (`pytest`, `black`, `mypy`, `deptry`, `ipdb`)
|
| 67 |
+
- **web**: Web server components (`fastapi`, `uvicorn`, `python-multipart`)
|
| 68 |
+
|
| 69 |
+
Installation examples:
|
| 70 |
```bash
|
| 71 |
+
pip install -e . # Core only
|
| 72 |
+
pip install -e ".[dev]" # Core + development tools
|
| 73 |
+
pip install -e ".[dev,web]" # Core + dev + web server
|
| 74 |
```
|
| 75 |
|
| 76 |
+
## π§ Technical Architecture
|
|
|
|
|
|
|
| 77 |
|
| 78 |
### Key Components
|
| 79 |
|
| 80 |
+
#### ποΈ Core Classes and Factories
|
| 81 |
+
- **`Configuration`** (`configuration.py`): Application settings including model names, file paths, and parameters
|
| 82 |
+
- **`DatastoreManager`** (`datastore.py`): Manages Qdrant vector store, document loading, and semantic chunking
|
| 83 |
+
- **`RAGChainFactory`** (`rag.py`): Creates retrieval-augmented generation chains with reference compilation
|
| 84 |
+
- **`RAGChainInstance`** (`rag.py`): Encapsulates complete RAG instances with embeddings and vector stores
|
| 85 |
+
|
| 86 |
+
#### πΈοΈ Multi-Agent System
|
| 87 |
+
- **`PsTutsTeamState`** (`state.py`): TypedDict managing multi-agent conversation state
|
| 88 |
+
- **Agent creation functions** (`graph.py`): Factory functions for different agent types:
|
| 89 |
+
- `create_rag_node()`: Video search agent using RAG
|
| 90 |
+
- `create_tavily_node()`: Adobe Help web search agent
|
| 91 |
+
- `create_team_supervisor()`: LLM-based routing supervisor
|
| 92 |
+
- **LangGraph implementation**: Multi-agent coordination with state management
|
| 93 |
+
|
| 94 |
+
#### π Document Processing
|
| 95 |
+
- **`VideoTranscriptBulkLoader`**: Loads entire video transcripts as single documents
|
| 96 |
+
- **`VideoTranscriptChunkLoader`**: Loads individual transcript segments with timestamps
|
| 97 |
+
- **`chunk_transcripts()`**: Async semantic chunking with timestamp preservation
|
| 98 |
+
- **Custom embedding models**: Fine-tuned embeddings for PsTuts domain
|
| 99 |
+
|
| 100 |
+
#### π Evaluation System
|
| 101 |
+
- **`evaluator_utils.py`**: RAG evaluation utilities using RAGAS framework
|
| 102 |
+
- **Notebook-based evaluation**: `evaluate_rag.ipynb` for systematic testing
|
| 103 |
+
|
| 104 |
+
## π Running the Applications
|
| 105 |
+
|
| 106 |
+
### Multi-Agent RAG (Recommended) π€
|
| 107 |
```bash
|
| 108 |
+
chainlit run app.py
|
|
|
|
| 109 |
```
|
| 110 |
+
Features team of agents including video search and web search capabilities.
|
| 111 |
|
| 112 |
+
### Simple RAG (Basic) π
|
| 113 |
```bash
|
| 114 |
+
chainlit run app_simple_rag.py
|
| 115 |
```
|
| 116 |
+
Single-agent RAG system for straightforward queries.
|
| 117 |
+
|
| 118 |
+
## π¬ Development Workflow
|
| 119 |
|
| 120 |
+
1. **Environment Setup**:
|
| 121 |
```bash
|
| 122 |
+
python -m venv venv
|
| 123 |
+
source venv/bin/activate # On Windows: venv\Scripts\activate
|
| 124 |
+
pip install -e ".[dev]"
|
| 125 |
```
|
| 126 |
|
| 127 |
+
2. **Environment Variables**:
|
| 128 |
```bash
|
| 129 |
+
export OPENAI_API_KEY="your-openai-key"
|
| 130 |
+
export TAVILY_API_KEY="your-tavily-key" # Optional, for web search
|
| 131 |
```
|
| 132 |
|
| 133 |
+
3. **Code Quality Tools**:
|
|
|
|
|
|
|
| 134 |
```bash
|
| 135 |
+
# Dependency analysis
|
| 136 |
deptry .
|
|
|
|
| 137 |
|
| 138 |
+
# Code formatting and linting
|
|
|
|
| 139 |
black .
|
| 140 |
ruff check .
|
| 141 |
mypy .
|
| 142 |
+
|
| 143 |
+
# Development debugging
|
| 144 |
+
ipdb # Available for interactive debugging
|
| 145 |
```
|
| 146 |
|
| 147 |
+
4. **Notebook Development**:
|
| 148 |
+
- Use `notebooks/` for experimentation
|
| 149 |
+
- `evaluate_rag.ipynb` for systematic evaluation
|
| 150 |
+
- Fine-tuning experiments in `Fine_Tuning_Embedding_for_PSTuts.ipynb`
|
| 151 |
+
|
| 152 |
+
## ποΈ Architecture Notes
|
| 153 |
+
|
| 154 |
+
- **Embedding models**: Uses custom fine-tuned `snowflake-arctic-embed-s-ft-pstuts` by default
|
| 155 |
+
- **Vector store**: Qdrant with semantic chunking for optimal retrieval
|
| 156 |
+
- **LLM**: GPT-4.1-mini for generation and routing
|
| 157 |
+
- **Web search**: Tavily integration targeting `helpx.adobe.com`
|
| 158 |
+
- **State management**: LangGraph for multi-agent coordination
|
| 159 |
+
- **Evaluation**: RAGAS framework for retrieval and generation metrics
|
| 160 |
+
|
| 161 |
## π Resources
|
| 162 |
|
| 163 |
+
- [Chainlit Documentation](https://docs.chainlit.io) π
|
| 164 |
+
- [LangChain Documentation](https://python.langchain.com/docs/get_started/introduction) π¦
|
| 165 |
+
- [LangGraph Documentation](https://langchain-ai.github.io/langgraph/) πΈοΈ
|
| 166 |
+
- [Qdrant Documentation](https://qdrant.tech/documentation/) π
|
| 167 |
+
- [RAGAS Documentation](https://docs.ragas.io/) π
|