FreestylerAI's picture
indev-v1
fbf0ed4 verified
---
title: PDF Q&A Dataset Generator
emoji: πŸ“š
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.29.0
app_file: app.py
pinned: false
---
# PDF Q&A Dataset Generator
A Gradio application that generates Q&A datasets from PDF documents using instruction-tuned language models.
## Features
- **PDF Processing**: Automatically extract and chunk text from uploaded PDFs
- **Q&A Generation**: Create questions, answers, tags, and difficulty levels
- **Multiple Models**: Choose from various instruction-tuned models
- **Customization**: Configure number of questions, tags, and difficulty settings
- **Multiple Output Formats**: Export datasets as JSON, CSV, or Excel
## How It Works
This application:
1. Extracts text from uploaded PDFs
2. Splits the content into manageable chunks to maintain context
3. Uses instruction-tuned language models to generate Q&A pairs with tags
4. Combines these into a comprehensive dataset ready for use
## Use Cases
- Creating educational resources and assessment materials
- Generating training data for Q&A systems
- Building flashcard datasets for studying
- Developing content for educational applications
- Preparing comprehension testing materials
## Getting Started
### Local Installation
```bash
git clone https://github.com/your-username/pdf-qa-generator.git
cd pdf-qa-generator
pip install -r requirements.txt
python app.py
```
### Using on Hugging Face Spaces
1. Duplicate this Space to your account
2. Upload your PDFs
3. Configure your settings
4. Generate your Q&A dataset
### Enabling GPU on Hugging Face Spaces
To enable GPU acceleration on Hugging Face Spaces:
1. Uncomment the `# import spaces` line at the top of app.py
2. Uncomment the `# @spaces.GPU` decorator above the `process_pdf_generate_qa` function
3. Save and redeploy your Space with GPU hardware selected
## Models
The app includes a selection of instruction-tuned language models:
- `databricks/dolly-v2-3b` (default)
- `databricks/dolly-v2-7b`
- `EleutherAI/gpt-neo-1.3B`
- `EleutherAI/gpt-neo-2.7B`
- `tiiuae/falcon-7b-instruct`
## License
MIT