FreestylerAI's picture
indev-v1
fbf0ed4 verified

A newer version of the Gradio SDK is available: 5.36.2

Upgrade
metadata
title: PDF Q&A Dataset Generator
emoji: πŸ“š
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.29.0
app_file: app.py
pinned: false

PDF Q&A Dataset Generator

A Gradio application that generates Q&A datasets from PDF documents using instruction-tuned language models.

Features

  • PDF Processing: Automatically extract and chunk text from uploaded PDFs
  • Q&A Generation: Create questions, answers, tags, and difficulty levels
  • Multiple Models: Choose from various instruction-tuned models
  • Customization: Configure number of questions, tags, and difficulty settings
  • Multiple Output Formats: Export datasets as JSON, CSV, or Excel

How It Works

This application:

  1. Extracts text from uploaded PDFs
  2. Splits the content into manageable chunks to maintain context
  3. Uses instruction-tuned language models to generate Q&A pairs with tags
  4. Combines these into a comprehensive dataset ready for use

Use Cases

  • Creating educational resources and assessment materials
  • Generating training data for Q&A systems
  • Building flashcard datasets for studying
  • Developing content for educational applications
  • Preparing comprehension testing materials

Getting Started

Local Installation

git clone https://github.com/your-username/pdf-qa-generator.git
cd pdf-qa-generator
pip install -r requirements.txt
python app.py

Using on Hugging Face Spaces

  1. Duplicate this Space to your account
  2. Upload your PDFs
  3. Configure your settings
  4. Generate your Q&A dataset

Enabling GPU on Hugging Face Spaces

To enable GPU acceleration on Hugging Face Spaces:

  1. Uncomment the # import spaces line at the top of app.py
  2. Uncomment the # @spaces.GPU decorator above the process_pdf_generate_qa function
  3. Save and redeploy your Space with GPU hardware selected

Models

The app includes a selection of instruction-tuned language models:

  • databricks/dolly-v2-3b (default)
  • databricks/dolly-v2-7b
  • EleutherAI/gpt-neo-1.3B
  • EleutherAI/gpt-neo-2.7B
  • tiiuae/falcon-7b-instruct

License

MIT