Spaces:

zaiffi
/

Mehfil-e-Sukhan

Sleeping

App Files Files Community

Mehfil-e-Sukhan / README.md

zaiffi

Add model and dataset links to README

4354715 22 days ago

preview code

raw

history blame contribute delete

4.53 kB

	---
	title: Mehfil-e-Sukhan
	emoji: 📜
	colorFrom: "red"
	colorTo: "gray"
	sdk: streamlit
	sdk_version: "1.43.0"
	app_file: app.py
	pinned: false
	---

	# Mehfil-e-Sukhan: Har Lafz Ek Mehfil

	An AI-powered Roman Urdu poetry generation application using BiLSTM neural networks.

	## Overview

	Mehfil-e-Sukhan ("Poetry Gathering" in Urdu) is an interactive application that generates Roman Urdu poetry based on a starting word or phrase provided by the user. The application uses a Bidirectional LSTM neural network trained on a curated dataset of Roman Urdu poetry.

	## Features

	- Custom Poetry Generation: Generate Roman Urdu poetry from any starting word or phrase.
	- Adjustable Parameters:
	- Number of Words: Control the length of generated poetry (12-48 words).
	- Creativity (Temperature): Adjust the randomness in word selection (0.5-2.0).
	- Focus (Top-p): Fine-tune how closely the model adheres to probable word sequences (0.5-1.0).
	- Elegant Interface: Dark-themed UI designed specifically for poetry presentation.
	- Automatic Formatting: Output is automatically formatted into poetic lines.

	## How to Use

	1. Enter a starting word or phrase in Roman Urdu (e.g., "ishq", "zindagi", "mohabbat").
	2. Adjust the generation parameters:
	- Number of Words: Select how many words you want in your poem.
	- Creativity: Higher values (>1.0) produce more unique but potentially less coherent poetry. Lower values (<1.0) create more predictable output.
	- Focus: Higher values make the AI stick to more probable word combinations.
	3. Click "Generate Poetry" and wait for your custom poem to appear.

	## Technical Details

	- Model: Bidirectional LSTM with 3 layers
	- Tokenization: SentencePiece with BPE encoding
	- Vocabulary Size: 12,000 tokens
	- Text Generation: Nucleus (top-p) sampling for balanced creativity and coherence

	## Installation for Local Development

	If you want to run the application locally:

	```bash
	# Clone the repository
	git clone https://github.com/yourusername/Mehfil-e-Sukhan.git
	cd Mehfil-e-Sukhan

	# Create and activate a virtual environment (optional but recommended)
	python -m venv venv
	source venv/bin/activate # On Linux/Mac
	# or
	venv\Scripts\activate # On Windows

	# Install dependencies
	pip install -r requirements.txt

	# Run the application
	streamlit run app.py
	```

	## Requirements

	- Python 3.8+
	- torch==2.6.0
	- sentencepiece==0.2.0
	- huggingface-hub==0.29.3
	- streamlit==1.43.0

	## Project Structure

	```
	Mehfil-e-Sukhan/
	├── app.py # Main application file
	├── requirements.txt # Python dependencies
	└── README.md # This documentation
	```

	The model weights and SentencePiece model are stored on Hugging Face Hub and are downloaded automatically when the application runs.

	## How It Works

	1. Data Processing: The model was trained on a curated dataset of Roman Urdu poetry lines.
	2. Tokenization: Text was tokenized using SentencePiece's BPE algorithm.
	3. Model Training: A Bidirectional LSTM architecture was trained to predict the next token in a sequence.
	4. Text Generation: At inference time, nucleus sampling is used to select the next word with a balance of creativity and coherence.
	5. Formatting: Generated text is automatically formatted into lines with alternating indentation for aesthetic presentation.

	## Model and Dataset

	- Model: You can find the complete model, weights, and training notebooks on Hugging Face:
	[Mehfil-e-Sukhan on Hugging Face](https://huggingface.co/zaiffi/Mehfil-e-Sukhan)
	- Dataset: The model was trained on the Roman Urdu Poetry dataset available on Kaggle:
	[Roman Urdu Poetry Dataset](https://www.kaggle.com/datasets/mianahmadhasan/roman-urdu-poetry-csv)

	## Limitations

	- The current model was trained on a relatively small dataset (~1300 lines), which may occasionally result in repetitive patterns.
	- Roman Urdu is not standardized, so the model may struggle with unusual spellings or transliterations.
	- Generation speed depends on available computational resources.

	## License

	This project is licensed under the Apache License 2.0 - see the [LICENSE](LICENSE) file for details.

	## Contact

	- LinkedIn: [Muhammad Huzaifa Saqib](https://www.linkedin.com/in/muhammad-huzaifa-saqib-90a1a9324/)
	- GitHub: [zaiffishiekh01](https://github.com/zaiffishiekh01)
	- Email: [[email protected]](mailto:[email protected])

	## Acknowledgements

	- Poetry is the rhythmical creation of beauty in words - Edgar Allan Poe