Spaces:
Sleeping
Sleeping
Add comprehensive README with app configuration and usage instructions
Browse files
README.md
CHANGED
@@ -1 +1,114 @@
|
|
1 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
title: Mehfil-e-Sukhan
|
3 |
+
emoji: π
|
4 |
+
colorFrom: "#E64A4A"
|
5 |
+
colorTo: "#1C1C1C"
|
6 |
+
sdk: streamlit
|
7 |
+
sdk_version: "1.43.0"
|
8 |
+
app_file: app.py
|
9 |
+
pinned: false
|
10 |
+
---
|
11 |
+
|
12 |
+
# Mehfil-e-Sukhan: Har Lafz Ek Mehfil
|
13 |
+
|
14 |
+
An AI-powered Roman Urdu poetry generation application using BiLSTM neural networks.
|
15 |
+
|
16 |
+
## Overview
|
17 |
+
|
18 |
+
Mehfil-e-Sukhan ("Poetry Gathering" in Urdu) is an interactive application that generates Roman Urdu poetry based on a starting word or phrase provided by the user. The application uses a Bidirectional LSTM neural network trained on a curated dataset of Roman Urdu poetry.
|
19 |
+
|
20 |
+
## Features
|
21 |
+
|
22 |
+
- **Custom Poetry Generation**: Generate Roman Urdu poetry from any starting word or phrase.
|
23 |
+
- **Adjustable Parameters**:
|
24 |
+
- **Number of Words**: Control the length of generated poetry (12-48 words).
|
25 |
+
- **Creativity (Temperature)**: Adjust the randomness in word selection (0.5-2.0).
|
26 |
+
- **Focus (Top-p)**: Fine-tune how closely the model adheres to probable word sequences (0.5-1.0).
|
27 |
+
- **Elegant Interface**: Dark-themed UI designed specifically for poetry presentation.
|
28 |
+
- **Automatic Formatting**: Output is automatically formatted into poetic lines.
|
29 |
+
|
30 |
+
## How to Use
|
31 |
+
|
32 |
+
1. Enter a starting word or phrase in Roman Urdu (e.g., "ishq", "zindagi", "mohabbat").
|
33 |
+
2. Adjust the generation parameters:
|
34 |
+
- Number of Words: Select how many words you want in your poem.
|
35 |
+
- Creativity: Higher values (>1.0) produce more unique but potentially less coherent poetry. Lower values (<1.0) create more predictable output.
|
36 |
+
- Focus: Higher values make the AI stick to more probable word combinations.
|
37 |
+
3. Click "Generate Poetry" and wait for your custom poem to appear.
|
38 |
+
|
39 |
+
## Technical Details
|
40 |
+
|
41 |
+
- **Model**: Bidirectional LSTM with 3 layers
|
42 |
+
- **Tokenization**: SentencePiece with BPE encoding
|
43 |
+
- **Vocabulary Size**: 12,000 tokens
|
44 |
+
- **Text Generation**: Nucleus (top-p) sampling for balanced creativity and coherence
|
45 |
+
|
46 |
+
## Installation for Local Development
|
47 |
+
|
48 |
+
If you want to run the application locally:
|
49 |
+
|
50 |
+
```bash
|
51 |
+
# Clone the repository
|
52 |
+
git clone https://github.com/yourusername/Mehfil-e-Sukhan.git
|
53 |
+
cd Mehfil-e-Sukhan
|
54 |
+
|
55 |
+
# Create and activate a virtual environment (optional but recommended)
|
56 |
+
python -m venv venv
|
57 |
+
source venv/bin/activate # On Linux/Mac
|
58 |
+
# or
|
59 |
+
venv\Scripts\activate # On Windows
|
60 |
+
|
61 |
+
# Install dependencies
|
62 |
+
pip install -r requirements.txt
|
63 |
+
|
64 |
+
# Run the application
|
65 |
+
streamlit run app.py
|
66 |
+
```
|
67 |
+
|
68 |
+
## Requirements
|
69 |
+
|
70 |
+
- Python 3.8+
|
71 |
+
- torch==2.6.0
|
72 |
+
- sentencepiece==0.2.0
|
73 |
+
- huggingface-hub==0.29.3
|
74 |
+
- streamlit==1.43.0
|
75 |
+
|
76 |
+
## Project Structure
|
77 |
+
|
78 |
+
```
|
79 |
+
Mehfil-e-Sukhan/
|
80 |
+
βββ app.py # Main application file
|
81 |
+
βββ requirements.txt # Python dependencies
|
82 |
+
βββ README.md # This documentation
|
83 |
+
```
|
84 |
+
|
85 |
+
The model weights and SentencePiece model are stored on Hugging Face Hub and are downloaded automatically when the application runs.
|
86 |
+
|
87 |
+
## How It Works
|
88 |
+
|
89 |
+
1. **Data Processing**: The model was trained on a curated dataset of Roman Urdu poetry lines.
|
90 |
+
2. **Tokenization**: Text was tokenized using SentencePiece's BPE algorithm.
|
91 |
+
3. **Model Training**: A Bidirectional LSTM architecture was trained to predict the next token in a sequence.
|
92 |
+
4. **Text Generation**: At inference time, nucleus sampling is used to select the next word with a balance of creativity and coherence.
|
93 |
+
5. **Formatting**: Generated text is automatically formatted into lines with alternating indentation for aesthetic presentation.
|
94 |
+
|
95 |
+
## Limitations
|
96 |
+
|
97 |
+
- The current model was trained on a relatively small dataset (~1300 lines), which may occasionally result in repetitive patterns.
|
98 |
+
- Roman Urdu is not standardized, so the model may struggle with unusual spellings or transliterations.
|
99 |
+
- Generation speed depends on available computational resources.
|
100 |
+
|
101 |
+
## License
|
102 |
+
|
103 |
+
This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.
|
104 |
+
|
105 |
+
## Contact
|
106 |
+
|
107 |
+
- LinkedIn: [Muhammad Huzaifa Saqib](https://www.linkedin.com/in/muhammad-huzaifa-saqib-90a1a9324/)
|
108 |
+
- GitHub: [zaiffishiekh01](https://github.com/zaiffishiekh01)
|
109 |
+
- Email: [[email protected]](mailto:[email protected])
|
110 |
+
|
111 |
+
## Acknowledgements
|
112 |
+
|
113 |
+
- Poetry is the rhythmical creation of beauty in words - Edgar Allan Poe
|
114 |
+
|