gradio python-docx transformers torch flax tiktoken sentencepiece pdfminer.six datasets nltk scikit-learn