PhoBERT Fine-tuned for Vietnamese Smart Home NER/Slot Filling

This model is a fine-tuned version of vinai/phobert-base for Named Entity Recognition (NER) in Vietnamese smart home commands. It extracts slot values such as devices, locations, times, and numeric values from user commands.

Model Description

Base Model: vinai/phobert-base
Task: Token Classification / Slot Filling for Smart Home Commands
Language: Vietnamese
Number of Entity Types: 13

Intended Uses & Limitations

Intended Uses

Extracting entities from Vietnamese smart home voice commands
Slot filling for voice assistant systems
Integration with intent classification for complete NLU pipeline
Research in Vietnamese NLP for IoT applications

Limitations

Optimized specifically for smart home domain
May not generalize well to other domains
Trained on Vietnamese language only
Best performance when used with corresponding intent classifier

Entity Types (Slot Labels)

The model recognizes 13 types of entities:

B-device / I-device - Device names (e.g., "đèn", "quạt", "điều hòa")
B-living_space / I-living_space - Room/location names (e.g., "phòng khách", "phòng ngủ")
B-time_at / I-time_at - Specific times (e.g., "10 giờ tối", "7 giờ sáng")
B-duration / I-duration - Time durations (e.g., "5 phút", "2 giờ")
B-target_number / I-target_number - Target values (e.g., "25 độ", "50%")
B-changing_value / I-changing_value - Change amounts (e.g., "tăng 10%")
O - Outside/No entity

How to Use

Using Transformers Library

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch
import json

# Load model and tokenizer
model_name = "ntgiaky/phobert-ner-smart-home"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

# Load label mappings
with open('label_mappings.json', 'r') as f:
    label_mappings = json.load(f)
    id2label = {int(k): v for k, v in label_mappings['id2label'].items()}

def extract_entities(text):
    # Tokenize
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
    tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
    
    # Predict
    with torch.no_grad():
        outputs = model(**inputs)
        predictions = torch.argmax(outputs.logits, dim=2)
    
    # Extract entities
    entities = []
    current_entity = None
    current_tokens = []
    
    for token, pred_id in zip(tokens, predictions[0]):
        label = id2label[pred_id.item()]
        
        if label.startswith('B-'):
            # Save previous entity if exists
            if current_entity:
                entities.append({
                    'type': current_entity,
                    'text': tokenizer.convert_tokens_to_string(current_tokens)
                })
            # Start new entity
            current_entity = label[2:]
            current_tokens = [token]
        elif label.startswith('I-') and current_entity == label[2:]:
            # Continue current entity
            current_tokens.append(token)
        else:
            # End current entity
            if current_entity:
                entities.append({
                    'type': current_entity,
                    'text': tokenizer.convert_tokens_to_string(current_tokens)
                })
            current_entity = None
            current_tokens = []
    
    # Don't forget last entity
    if current_entity:
        entities.append({
            'type': current_entity,
            'text': tokenizer.convert_tokens_to_string(current_tokens)
        })
    
    return entities

# Example usage
text = "bật đèn phòng khách lúc 7 giờ tối"
entities = extract_entities(text)
print(f"Input: {text}")
print(f"Entities: {entities}")

Using Pipeline

from transformers import pipeline

# Load NER pipeline
ner = pipeline(
    "token-classification",
    model="ntgiaky/phobert-ner-smart-home",
    aggregation_strategy="simple"
)

# Extract entities
result = ner("tắt quạt phòng ngủ sau 10 phút")
print(result)

Integration with Intent Classification

For a complete NLU pipeline:

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

# Load with PhoBERT tokenizer explicitly
tokenizer = AutoTokenizer.from_pretrained("vinai/phobert-base")
model = AutoModelForTokenClassification.from_pretrained(
    "ntgiaky/phobert-ner-smart-home",
    ignore_mismatched_sizes=True  # Add this if needed
)

# Create pipeline with explicit tokenizer
ner = pipeline(
    "token-classification",
    model=model,
    tokenizer=tokenizer,
    aggregation_strategy="simple"
)

# Test
result = ner("bật đèn phòng khách")
print(result)

Example Outputs

# Input: "bật đèn phòng khách"
# [{'entity_group': 'living_space', 'score': np.float32(0.97212785), 'word': 'đèn', 'start': None, 'end': None},
# {'entity_group': 'duration', 'score': np.float32(0.9332844), 'word': 'phòng khách', 'start': None, 'end': None}]

Citation

If you use this model, please cite:

@misc{phobert-ner-smart-home-2025,
  author = {Trần Quang Huy and Nguyễn Trần Gia Kỳ},
  title = {PhoBERT Fine-tuned for Vietnamese Smart Home NER},
  year = {2025},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub},
  howpublished = {\url{https://huggingface.co/ntgiaky/ner-smart-home}}
}

Authors

Trần Quang Huy
Nguyễn Trần Gia Kỳ

License

This model is released under the MIT License.

ntgiaky
/

ner-smart-home