longobardomartin's picture
final commit
1d9f996 verified

logo_ironhack_blue 7

Final Project | Earnings Q&A Chatbot

Project Overview

This project builds a YouTube-based Q&A chatbot using Gradio, LangChain, Pinecone, and OpenAI. The chatbot provides answers about turism in Marbella city leveraging both internal document vectors and web search for responses. Users can update the knowledgebase by adding new url youtube videos to the urls.txt file.

Please visit the followinw link to make use of this agent: https://huggingface.co/spaces/longobardomartin/proyectofinal

Table of Contents

  • Folder Structure
  • Environment Setup
  • Project Architecture
  • Usage

Folder Structure

  • app.py - Main script to run the chatbot.
  • knowledgebase.py - Script to get transcripts from YouTube videos and storing the data in Pinecone.
  • agent.py - Script to specify the agent behaivour.
  • utils.py - Script for auxiliary functions.
  • urls.txt - Text file containing YouTube video links used in the project.
  • requirements.txt - Text file containing libraries and dependencies to be installed locally.
  • README.md - Project documentation (this file).
  • Marbella turism.pdf - Presentation slides for the Final Project.

Environment Setup

  1. Add your environment variables by setting up a .env file or using prompts in the script:
    • OPENAI_API_KEY: API key for OpenAI.
    • LANGCHAIN_API_KEY: API key for LangChain.
    • PINECONE_API_KEY: API key for Pinecone.
    • SERPAPI_API_KEY: API key for SerpAPI.

Project Architecture

The chatbot uses the following architecture:

Solution Architecture

Solution Architecture

  • Data Retrieval: Combines a vector database (Pinecone) for structured data retrieval and SerpAPI for web search.
  • Routing: Uses LangChain's ReAct agent to dynamically route user questions to the appropriate source (either vector store or web search) using Tools.
  • Memory: ConversationBufferMemory maintains chat history for contextual, multi-turn conversations.
  • LLM Integration: GPT-4 processes user queries, generates responses, and summarizes search results, aided by ConversationBufferMemory and Chathistory function.

Usage

  • use requirements.txt to install the necessary packages
  • Add video links to urls.txt (one link per line).
  • Run the script to generate transcriptions, create embeddings, and set up the chatbot interface.
  • Interact with the chatbot by typing questions relevant to the video content.
  • the script also generates a Gradio file locally.