Spaces:
Sleeping
Sleeping
File size: 2,166 Bytes
fbf0ed4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 |
---
title: PDF Q&A Dataset Generator
emoji: π
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.29.0
app_file: app.py
pinned: false
---
# PDF Q&A Dataset Generator
A Gradio application that generates Q&A datasets from PDF documents using instruction-tuned language models.
## Features
- **PDF Processing**: Automatically extract and chunk text from uploaded PDFs
- **Q&A Generation**: Create questions, answers, tags, and difficulty levels
- **Multiple Models**: Choose from various instruction-tuned models
- **Customization**: Configure number of questions, tags, and difficulty settings
- **Multiple Output Formats**: Export datasets as JSON, CSV, or Excel
## How It Works
This application:
1. Extracts text from uploaded PDFs
2. Splits the content into manageable chunks to maintain context
3. Uses instruction-tuned language models to generate Q&A pairs with tags
4. Combines these into a comprehensive dataset ready for use
## Use Cases
- Creating educational resources and assessment materials
- Generating training data for Q&A systems
- Building flashcard datasets for studying
- Developing content for educational applications
- Preparing comprehension testing materials
## Getting Started
### Local Installation
```bash
git clone https://github.com/your-username/pdf-qa-generator.git
cd pdf-qa-generator
pip install -r requirements.txt
python app.py
```
### Using on Hugging Face Spaces
1. Duplicate this Space to your account
2. Upload your PDFs
3. Configure your settings
4. Generate your Q&A dataset
### Enabling GPU on Hugging Face Spaces
To enable GPU acceleration on Hugging Face Spaces:
1. Uncomment the `# import spaces` line at the top of app.py
2. Uncomment the `# @spaces.GPU` decorator above the `process_pdf_generate_qa` function
3. Save and redeploy your Space with GPU hardware selected
## Models
The app includes a selection of instruction-tuned language models:
- `databricks/dolly-v2-3b` (default)
- `databricks/dolly-v2-7b`
- `EleutherAI/gpt-neo-1.3B`
- `EleutherAI/gpt-neo-2.7B`
- `tiiuae/falcon-7b-instruct`
## License
MIT |