zaiffi commited on
Commit
ce55859
Β·
1 Parent(s): ffbd65c

Add comprehensive README with app configuration and usage instructions

Browse files
Files changed (1) hide show
  1. README.md +114 -1
README.md CHANGED
@@ -1 +1,114 @@
1
- # Mehfil-e-Sukhan
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: Mehfil-e-Sukhan
3
+ emoji: πŸ“œ
4
+ colorFrom: "#E64A4A"
5
+ colorTo: "#1C1C1C"
6
+ sdk: streamlit
7
+ sdk_version: "1.43.0"
8
+ app_file: app.py
9
+ pinned: false
10
+ ---
11
+
12
+ # Mehfil-e-Sukhan: Har Lafz Ek Mehfil
13
+
14
+ An AI-powered Roman Urdu poetry generation application using BiLSTM neural networks.
15
+
16
+ ## Overview
17
+
18
+ Mehfil-e-Sukhan ("Poetry Gathering" in Urdu) is an interactive application that generates Roman Urdu poetry based on a starting word or phrase provided by the user. The application uses a Bidirectional LSTM neural network trained on a curated dataset of Roman Urdu poetry.
19
+
20
+ ## Features
21
+
22
+ - **Custom Poetry Generation**: Generate Roman Urdu poetry from any starting word or phrase.
23
+ - **Adjustable Parameters**:
24
+ - **Number of Words**: Control the length of generated poetry (12-48 words).
25
+ - **Creativity (Temperature)**: Adjust the randomness in word selection (0.5-2.0).
26
+ - **Focus (Top-p)**: Fine-tune how closely the model adheres to probable word sequences (0.5-1.0).
27
+ - **Elegant Interface**: Dark-themed UI designed specifically for poetry presentation.
28
+ - **Automatic Formatting**: Output is automatically formatted into poetic lines.
29
+
30
+ ## How to Use
31
+
32
+ 1. Enter a starting word or phrase in Roman Urdu (e.g., "ishq", "zindagi", "mohabbat").
33
+ 2. Adjust the generation parameters:
34
+ - Number of Words: Select how many words you want in your poem.
35
+ - Creativity: Higher values (>1.0) produce more unique but potentially less coherent poetry. Lower values (<1.0) create more predictable output.
36
+ - Focus: Higher values make the AI stick to more probable word combinations.
37
+ 3. Click "Generate Poetry" and wait for your custom poem to appear.
38
+
39
+ ## Technical Details
40
+
41
+ - **Model**: Bidirectional LSTM with 3 layers
42
+ - **Tokenization**: SentencePiece with BPE encoding
43
+ - **Vocabulary Size**: 12,000 tokens
44
+ - **Text Generation**: Nucleus (top-p) sampling for balanced creativity and coherence
45
+
46
+ ## Installation for Local Development
47
+
48
+ If you want to run the application locally:
49
+
50
+ ```bash
51
+ # Clone the repository
52
+ git clone https://github.com/yourusername/Mehfil-e-Sukhan.git
53
+ cd Mehfil-e-Sukhan
54
+
55
+ # Create and activate a virtual environment (optional but recommended)
56
+ python -m venv venv
57
+ source venv/bin/activate # On Linux/Mac
58
+ # or
59
+ venv\Scripts\activate # On Windows
60
+
61
+ # Install dependencies
62
+ pip install -r requirements.txt
63
+
64
+ # Run the application
65
+ streamlit run app.py
66
+ ```
67
+
68
+ ## Requirements
69
+
70
+ - Python 3.8+
71
+ - torch==2.6.0
72
+ - sentencepiece==0.2.0
73
+ - huggingface-hub==0.29.3
74
+ - streamlit==1.43.0
75
+
76
+ ## Project Structure
77
+
78
+ ```
79
+ Mehfil-e-Sukhan/
80
+ β”œβ”€β”€ app.py # Main application file
81
+ β”œβ”€β”€ requirements.txt # Python dependencies
82
+ └── README.md # This documentation
83
+ ```
84
+
85
+ The model weights and SentencePiece model are stored on Hugging Face Hub and are downloaded automatically when the application runs.
86
+
87
+ ## How It Works
88
+
89
+ 1. **Data Processing**: The model was trained on a curated dataset of Roman Urdu poetry lines.
90
+ 2. **Tokenization**: Text was tokenized using SentencePiece's BPE algorithm.
91
+ 3. **Model Training**: A Bidirectional LSTM architecture was trained to predict the next token in a sequence.
92
+ 4. **Text Generation**: At inference time, nucleus sampling is used to select the next word with a balance of creativity and coherence.
93
+ 5. **Formatting**: Generated text is automatically formatted into lines with alternating indentation for aesthetic presentation.
94
+
95
+ ## Limitations
96
+
97
+ - The current model was trained on a relatively small dataset (~1300 lines), which may occasionally result in repetitive patterns.
98
+ - Roman Urdu is not standardized, so the model may struggle with unusual spellings or transliterations.
99
+ - Generation speed depends on available computational resources.
100
+
101
+ ## License
102
+
103
+ This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.
104
+
105
+ ## Contact
106
+
107
+ - LinkedIn: [Muhammad Huzaifa Saqib](https://www.linkedin.com/in/muhammad-huzaifa-saqib-90a1a9324/)
108
+ - GitHub: [zaiffishiekh01](https://github.com/zaiffishiekh01)
109
+ - Email: [[email protected]](mailto:[email protected])
110
+
111
+ ## Acknowledgements
112
+
113
+ - Poetry is the rhythmical creation of beauty in words - Edgar Allan Poe
114
+