File size: 4,526 Bytes
ce55859
 
 
be4bcc4
 
ce55859
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
4354715
 
 
 
 
 
 
ce55859
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
---
title: Mehfil-e-Sukhan
emoji: πŸ“œ
colorFrom: "red"
colorTo: "gray"
sdk: streamlit
sdk_version: "1.43.0"
app_file: app.py
pinned: false
---

# Mehfil-e-Sukhan: Har Lafz Ek Mehfil

An AI-powered Roman Urdu poetry generation application using BiLSTM neural networks.

## Overview

Mehfil-e-Sukhan ("Poetry Gathering" in Urdu) is an interactive application that generates Roman Urdu poetry based on a starting word or phrase provided by the user. The application uses a Bidirectional LSTM neural network trained on a curated dataset of Roman Urdu poetry.

## Features

- **Custom Poetry Generation**: Generate Roman Urdu poetry from any starting word or phrase.
- **Adjustable Parameters**:
  - **Number of Words**: Control the length of generated poetry (12-48 words).
  - **Creativity (Temperature)**: Adjust the randomness in word selection (0.5-2.0).
  - **Focus (Top-p)**: Fine-tune how closely the model adheres to probable word sequences (0.5-1.0).
- **Elegant Interface**: Dark-themed UI designed specifically for poetry presentation.
- **Automatic Formatting**: Output is automatically formatted into poetic lines.

## How to Use

1. Enter a starting word or phrase in Roman Urdu (e.g., "ishq", "zindagi", "mohabbat").
2. Adjust the generation parameters:
   - Number of Words: Select how many words you want in your poem.
   - Creativity: Higher values (>1.0) produce more unique but potentially less coherent poetry. Lower values (<1.0) create more predictable output.
   - Focus: Higher values make the AI stick to more probable word combinations.
3. Click "Generate Poetry" and wait for your custom poem to appear.

## Technical Details

- **Model**: Bidirectional LSTM with 3 layers
- **Tokenization**: SentencePiece with BPE encoding
- **Vocabulary Size**: 12,000 tokens
- **Text Generation**: Nucleus (top-p) sampling for balanced creativity and coherence

## Installation for Local Development

If you want to run the application locally:

```bash
# Clone the repository
git clone https://github.com/yourusername/Mehfil-e-Sukhan.git
cd Mehfil-e-Sukhan

# Create and activate a virtual environment (optional but recommended)
python -m venv venv
source venv/bin/activate  # On Linux/Mac
# or
venv\Scripts\activate  # On Windows

# Install dependencies
pip install -r requirements.txt

# Run the application
streamlit run app.py
```

## Requirements

- Python 3.8+
- torch==2.6.0
- sentencepiece==0.2.0
- huggingface-hub==0.29.3
- streamlit==1.43.0

## Project Structure

```
Mehfil-e-Sukhan/
β”œβ”€β”€ app.py              # Main application file
β”œβ”€β”€ requirements.txt    # Python dependencies
└── README.md           # This documentation
```

The model weights and SentencePiece model are stored on Hugging Face Hub and are downloaded automatically when the application runs.

## How It Works

1. **Data Processing**: The model was trained on a curated dataset of Roman Urdu poetry lines.
2. **Tokenization**: Text was tokenized using SentencePiece's BPE algorithm.
3. **Model Training**: A Bidirectional LSTM architecture was trained to predict the next token in a sequence.
4. **Text Generation**: At inference time, nucleus sampling is used to select the next word with a balance of creativity and coherence.
5. **Formatting**: Generated text is automatically formatted into lines with alternating indentation for aesthetic presentation.

## Model and Dataset

- **Model**: You can find the complete model, weights, and training notebooks on Hugging Face:
  [Mehfil-e-Sukhan on Hugging Face](https://huggingface.co/zaiffi/Mehfil-e-Sukhan)
- **Dataset**: The model was trained on the Roman Urdu Poetry dataset available on Kaggle:
  [Roman Urdu Poetry Dataset](https://www.kaggle.com/datasets/mianahmadhasan/roman-urdu-poetry-csv)

## Limitations

- The current model was trained on a relatively small dataset (~1300 lines), which may occasionally result in repetitive patterns.
- Roman Urdu is not standardized, so the model may struggle with unusual spellings or transliterations.
- Generation speed depends on available computational resources.

## License

This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.

## Contact

- LinkedIn: [Muhammad Huzaifa Saqib](https://www.linkedin.com/in/muhammad-huzaifa-saqib-90a1a9324/)
- GitHub: [zaiffishiekh01](https://github.com/zaiffishiekh01)
- Email: [[email protected]](mailto:[email protected])

## Acknowledgements

- Poetry is the rhythmical creation of beauty in words - Edgar Allan Poe