PhoBERT Fine-tuned for Vietnamese Smart Home NER/Slot Filling
This model is a fine-tuned version of vinai/phobert-base for Named Entity Recognition (NER) in Vietnamese smart home commands. It extracts slot values such as devices, locations, times, and numeric values from user commands.
Model Description
- Base Model: vinai/phobert-base
- Task: Token Classification / Slot Filling for Smart Home Commands
- Language: Vietnamese
- Number of Entity Types: 13
Intended Uses & Limitations
Intended Uses
- Extracting entities from Vietnamese smart home voice commands
- Slot filling for voice assistant systems
- Integration with intent classification for complete NLU pipeline
- Research in Vietnamese NLP for IoT applications
Limitations
- Optimized specifically for smart home domain
- May not generalize well to other domains
- Trained on Vietnamese language only
- Best performance when used with corresponding intent classifier
Entity Types (Slot Labels)
The model recognizes 13 types of entities:
B-device
/I-device
- Device names (e.g., "đèn", "quạt", "điều hòa")B-living_space
/I-living_space
- Room/location names (e.g., "phòng khách", "phòng ngủ")B-time_at
/I-time_at
- Specific times (e.g., "10 giờ tối", "7 giờ sáng")B-duration
/I-duration
- Time durations (e.g., "5 phút", "2 giờ")B-target_number
/I-target_number
- Target values (e.g., "25 độ", "50%")B-changing_value
/I-changing_value
- Change amounts (e.g., "tăng 10%")O
- Outside/No entity
How to Use
Using Transformers Library
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch
import json
# Load model and tokenizer
model_name = "ntgiaky/phobert-ner-smart-home"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
# Load label mappings
with open('label_mappings.json', 'r') as f:
label_mappings = json.load(f)
id2label = {int(k): v for k, v in label_mappings['id2label'].items()}
def extract_entities(text):
# Tokenize
inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
# Predict
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.argmax(outputs.logits, dim=2)
# Extract entities
entities = []
current_entity = None
current_tokens = []
for token, pred_id in zip(tokens, predictions[0]):
label = id2label[pred_id.item()]
if label.startswith('B-'):
# Save previous entity if exists
if current_entity:
entities.append({
'type': current_entity,
'text': tokenizer.convert_tokens_to_string(current_tokens)
})
# Start new entity
current_entity = label[2:]
current_tokens = [token]
elif label.startswith('I-') and current_entity == label[2:]:
# Continue current entity
current_tokens.append(token)
else:
# End current entity
if current_entity:
entities.append({
'type': current_entity,
'text': tokenizer.convert_tokens_to_string(current_tokens)
})
current_entity = None
current_tokens = []
# Don't forget last entity
if current_entity:
entities.append({
'type': current_entity,
'text': tokenizer.convert_tokens_to_string(current_tokens)
})
return entities
# Example usage
text = "bật đèn phòng khách lúc 7 giờ tối"
entities = extract_entities(text)
print(f"Input: {text}")
print(f"Entities: {entities}")
Using Pipeline
from transformers import pipeline
# Load NER pipeline
ner = pipeline(
"token-classification",
model="ntgiaky/phobert-ner-smart-home",
aggregation_strategy="simple"
)
# Extract entities
result = ner("tắt quạt phòng ngủ sau 10 phút")
print(result)
Integration with Intent Classification
For a complete NLU pipeline:
from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline
# Load with PhoBERT tokenizer explicitly
tokenizer = AutoTokenizer.from_pretrained("vinai/phobert-base")
model = AutoModelForTokenClassification.from_pretrained(
"ntgiaky/phobert-ner-smart-home",
ignore_mismatched_sizes=True # Add this if needed
)
# Create pipeline with explicit tokenizer
ner = pipeline(
"token-classification",
model=model,
tokenizer=tokenizer,
aggregation_strategy="simple"
)
# Test
result = ner("bật đèn phòng khách")
print(result)
Example Outputs
# Input: "bật đèn phòng khách"
# [{'entity_group': 'living_space', 'score': np.float32(0.97212785), 'word': 'đèn', 'start': None, 'end': None},
# {'entity_group': 'duration', 'score': np.float32(0.9332844), 'word': 'phòng khách', 'start': None, 'end': None}]
Citation
If you use this model, please cite:
@misc{phobert-ner-smart-home-2025,
author = {Trần Quang Huy and Nguyễn Trần Gia Kỳ},
title = {PhoBERT Fine-tuned for Vietnamese Smart Home NER},
year = {2025},
publisher = {Hugging Face},
journal = {Hugging Face Model Hub},
howpublished = {\url{https://huggingface.co/ntgiaky/ner-smart-home}}
}
Authors
- Trần Quang Huy
- Nguyễn Trần Gia Kỳ
License
This model is released under the MIT License.
- Downloads last month
- 12
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Evaluation results
- Accuracy on VN-SLU Augmented Datasetself-reported96.640
- F1 Score (Weighted) on VN-SLU Augmented Datasetself-reported86.550
- F1 Score (Macro) on VN-SLU Augmented Datasetself-reported67.040