# Turkish Tiktokenizer Web App A Streamlit-based web interface for the Turkish Morphological Tokenizer. This app provides an interactive way to tokenize Turkish text with real-time visualization and color-coded token display. ## Features - 🔤 Turkish text tokenization with morphological analysis - 🎨 Color-coded token visualization - 🔢 Token count and ID display - 📊 Special token highlighting (uppercase, space, newline, etc.) - 🔄 Version selection from GitHub commit history - 🌐 Direct integration with GitHub repository ## Demo You can try the live demo at [Hugging Face Spaces](https://huggingface.co/spaces/YOUR_USERNAME/turkish-tiktokenizer) (Replace with your actual Spaces URL) ## Installation 1. Clone the repository: ```bash git clone https://github.com/malibayram/tokenizer.git cd tokenizer/streamlit_app ``` 2. Install dependencies: ```bash pip install -r requirements.txt ``` ## Usage 1. Run the Streamlit app: ```bash streamlit run app.py ``` 2. Open your browser and navigate to http://localhost:8501 3. Enter Turkish text in the input area and click "Tokenize" ## How It Works 1. **Text Input**: Enter Turkish text in the left panel 2. **Tokenization**: Click the "Tokenize" button to process the text 3. **Visualization**: - Token count is displayed at the top - Tokens are shown with color-coding: - Special tokens (uppercase, space, etc.) have predefined colors - Regular tokens get unique colors for easy identification - Token IDs are displayed below the visualization ## Code Structure - `app.py`: Main Streamlit application - UI components and layout - GitHub integration - Tokenization logic - Color generation and visualization - `requirements.txt`: Python dependencies ## Technical Details - **Tokenizer Source**: Fetched directly from GitHub repository - **Caching**: Uses Streamlit's caching for better performance - **Color Generation**: HSV-based algorithm for visually distinct colors - **Session State**: Maintains text and results between interactions - **Error Handling**: Graceful handling of GitHub API and tokenization errors ## Deployment to Hugging Face Spaces 1. Create a new Space: - Go to https://huggingface.co/spaces - Click "Create new Space" - Select "Streamlit" as the SDK - Choose a name for your Space 2. Upload files: - `app.py` - `requirements.txt` 3. The app will automatically deploy and be available at your Space's URL ## Contributing 1. Fork the repository 2. Create your feature branch 3. Commit your changes 4. Push to the branch 5. Create a Pull Request ## License MIT License - see the [LICENSE](../LICENSE) file for details ## Acknowledgments - Built by dqbd - Created with the generous help from Diagram - Based on the [Turkish Morphological Tokenizer](https://github.com/malibayram/tokenizer)