Models / Datasets
Collection
4 items
•
Updated
•
1
Language | Dataset | Source | URL |
---|---|---|---|
All | Punctuation | https://huggingface.co/onelevelstudio/dataset/raw/main/nlp/punctuation.txt | |
vi |
Vocab | source | https://huggingface.co/onelevelstudio/dataset/raw/main/nlp/words_vi.txt |
vi |
Stopwords | source | https://huggingface.co/onelevelstudio/dataset/raw/main/nlp/stopwords_vi.txt |
vi |
Diacritics | https://huggingface.co/onelevelstudio/dataset/raw/main/nlp/diacritics_vi.txt | |
en |
Stopwords | nltk | https://huggingface.co/onelevelstudio/dataset/raw/main/nlp/stopwords_en.txt |
Usage:
# Short-term Usage
import requests
punctuation = requests.get("https://huggingface.co/onelevelstudio/dataset/raw/main/nlp/punctuation.txt").text.splitlines()
# Long-term Usage
from huggingface_hub import hf_hub_download as HF_Download
with open(HF_Download(repo_id="onelevelstudio/dataset", filename="nlp/punctuation.txt"), mode="r", encoding="utf-8") as f:
punctuation = f.read().splitlines()