Spaces:
Sleeping
Sleeping
title: Mehfil-e-Sukhan | |
emoji: π | |
colorFrom: "red" | |
colorTo: "gray" | |
sdk: streamlit | |
sdk_version: "1.43.0" | |
app_file: app.py | |
pinned: false | |
# Mehfil-e-Sukhan: Har Lafz Ek Mehfil | |
An AI-powered Roman Urdu poetry generation application using BiLSTM neural networks. | |
## Overview | |
Mehfil-e-Sukhan ("Poetry Gathering" in Urdu) is an interactive application that generates Roman Urdu poetry based on a starting word or phrase provided by the user. The application uses a Bidirectional LSTM neural network trained on a curated dataset of Roman Urdu poetry. | |
## Features | |
- **Custom Poetry Generation**: Generate Roman Urdu poetry from any starting word or phrase. | |
- **Adjustable Parameters**: | |
- **Number of Words**: Control the length of generated poetry (12-48 words). | |
- **Creativity (Temperature)**: Adjust the randomness in word selection (0.5-2.0). | |
- **Focus (Top-p)**: Fine-tune how closely the model adheres to probable word sequences (0.5-1.0). | |
- **Elegant Interface**: Dark-themed UI designed specifically for poetry presentation. | |
- **Automatic Formatting**: Output is automatically formatted into poetic lines. | |
## How to Use | |
1. Enter a starting word or phrase in Roman Urdu (e.g., "ishq", "zindagi", "mohabbat"). | |
2. Adjust the generation parameters: | |
- Number of Words: Select how many words you want in your poem. | |
- Creativity: Higher values (>1.0) produce more unique but potentially less coherent poetry. Lower values (<1.0) create more predictable output. | |
- Focus: Higher values make the AI stick to more probable word combinations. | |
3. Click "Generate Poetry" and wait for your custom poem to appear. | |
## Technical Details | |
- **Model**: Bidirectional LSTM with 3 layers | |
- **Tokenization**: SentencePiece with BPE encoding | |
- **Vocabulary Size**: 12,000 tokens | |
- **Text Generation**: Nucleus (top-p) sampling for balanced creativity and coherence | |
## Installation for Local Development | |
If you want to run the application locally: | |
```bash | |
# Clone the repository | |
git clone https://github.com/yourusername/Mehfil-e-Sukhan.git | |
cd Mehfil-e-Sukhan | |
# Create and activate a virtual environment (optional but recommended) | |
python -m venv venv | |
source venv/bin/activate # On Linux/Mac | |
# or | |
venv\Scripts\activate # On Windows | |
# Install dependencies | |
pip install -r requirements.txt | |
# Run the application | |
streamlit run app.py | |
``` | |
## Requirements | |
- Python 3.8+ | |
- torch==2.6.0 | |
- sentencepiece==0.2.0 | |
- huggingface-hub==0.29.3 | |
- streamlit==1.43.0 | |
## Project Structure | |
``` | |
Mehfil-e-Sukhan/ | |
βββ app.py # Main application file | |
βββ requirements.txt # Python dependencies | |
βββ README.md # This documentation | |
``` | |
The model weights and SentencePiece model are stored on Hugging Face Hub and are downloaded automatically when the application runs. | |
## How It Works | |
1. **Data Processing**: The model was trained on a curated dataset of Roman Urdu poetry lines. | |
2. **Tokenization**: Text was tokenized using SentencePiece's BPE algorithm. | |
3. **Model Training**: A Bidirectional LSTM architecture was trained to predict the next token in a sequence. | |
4. **Text Generation**: At inference time, nucleus sampling is used to select the next word with a balance of creativity and coherence. | |
5. **Formatting**: Generated text is automatically formatted into lines with alternating indentation for aesthetic presentation. | |
## Model and Dataset | |
- **Model**: You can find the complete model, weights, and training notebooks on Hugging Face: | |
[Mehfil-e-Sukhan on Hugging Face](https://huggingface.co/zaiffi/Mehfil-e-Sukhan) | |
- **Dataset**: The model was trained on the Roman Urdu Poetry dataset available on Kaggle: | |
[Roman Urdu Poetry Dataset](https://www.kaggle.com/datasets/mianahmadhasan/roman-urdu-poetry-csv) | |
## Limitations | |
- The current model was trained on a relatively small dataset (~1300 lines), which may occasionally result in repetitive patterns. | |
- Roman Urdu is not standardized, so the model may struggle with unusual spellings or transliterations. | |
- Generation speed depends on available computational resources. | |
## License | |
This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details. | |
## Contact | |
- LinkedIn: [Muhammad Huzaifa Saqib](https://www.linkedin.com/in/muhammad-huzaifa-saqib-90a1a9324/) | |
- GitHub: [zaiffishiekh01](https://github.com/zaiffishiekh01) | |
- Email: [[email protected]](mailto:[email protected]) | |
## Acknowledgements | |
- Poetry is the rhythmical creation of beauty in words - Edgar Allan Poe | |