roberta-base-thai-spm-upos
Model Description
This is a RoBERTa model pre-trained on Thai Wikipedia texts for POS-tagging and dependency-parsing, derived from roberta-base-thai-spm. Every word is tagged by UPOS (Universal Part-Of-Speech).
How to Use
import torch
from transformers import AutoTokenizer,AutoModelForTokenClassification
tokenizer=AutoTokenizer.from_pretrained("KoichiYasuoka/roberta-base-thai-spm-upos")
model=AutoModelForTokenClassification.from_pretrained("KoichiYasuoka/roberta-base-thai-spm-upos")
s="หลายหัวดีกว่าหัวเดียว"
t=tokenizer.tokenize(s)
p=[model.config.id2label[q] for q in torch.argmax(model(tokenizer.encode(s,return_tensors="pt"))["logits"],dim=2)[0].tolist()[1:-1]]
print(list(zip(t,p)))
or
import esupar
nlp=esupar.load("KoichiYasuoka/roberta-base-thai-spm-upos")
print(nlp("หลายหัวดีกว่าหัวเดียว"))
See Also
esupar: Tokenizer POS-tagger and Dependency-parser with BERT/RoBERTa/DeBERTa models
- Downloads last month
- 301
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.