PhoBERT Fine-tuned for Vietnamese Smart Home NER/Slot Filling

This model is a fine-tuned version of vinai/phobert-base for Named Entity Recognition (NER) in Vietnamese smart home commands. It extracts slot values such as devices, locations, times, and numeric values from user commands.

Model Description

  • Base Model: vinai/phobert-base
  • Task: Token Classification / Slot Filling for Smart Home Commands
  • Language: Vietnamese
  • Number of Entity Types: 13

Intended Uses & Limitations

Intended Uses

  • Extracting entities from Vietnamese smart home voice commands
  • Slot filling for voice assistant systems
  • Integration with intent classification for complete NLU pipeline
  • Research in Vietnamese NLP for IoT applications

Limitations

  • Optimized specifically for smart home domain
  • May not generalize well to other domains
  • Trained on Vietnamese language only
  • Best performance when used with corresponding intent classifier

Entity Types (Slot Labels)

The model recognizes 13 types of entities:

  1. B-device / I-device - Device names (e.g., "đèn", "quạt", "điều hòa")
  2. B-living_space / I-living_space - Room/location names (e.g., "phòng khách", "phòng ngủ")
  3. B-time_at / I-time_at - Specific times (e.g., "10 giờ tối", "7 giờ sáng")
  4. B-duration / I-duration - Time durations (e.g., "5 phút", "2 giờ")
  5. B-target_number / I-target_number - Target values (e.g., "25 độ", "50%")
  6. B-changing_value / I-changing_value - Change amounts (e.g., "tăng 10%")
  7. O - Outside/No entity

How to Use

Using Transformers Library

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch
import json

# Load model and tokenizer
model_name = "ntgiaky/phobert-ner-smart-home"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

# Load label mappings
with open('label_mappings.json', 'r') as f:
    label_mappings = json.load(f)
    id2label = {int(k): v for k, v in label_mappings['id2label'].items()}

def extract_entities(text):
    # Tokenize
    inputs = tokenizer(text, return_tensors="pt", padding=True, truncation=True)
    tokens = tokenizer.convert_ids_to_tokens(inputs["input_ids"][0])
    
    # Predict
    with torch.no_grad():
        outputs = model(**inputs)
        predictions = torch.argmax(outputs.logits, dim=2)
    
    # Extract entities
    entities = []
    current_entity = None
    current_tokens = []
    
    for token, pred_id in zip(tokens, predictions[0]):
        label = id2label[pred_id.item()]
        
        if label.startswith('B-'):
            # Save previous entity if exists
            if current_entity:
                entities.append({
                    'type': current_entity,
                    'text': tokenizer.convert_tokens_to_string(current_tokens)
                })
            # Start new entity
            current_entity = label[2:]
            current_tokens = [token]
        elif label.startswith('I-') and current_entity == label[2:]:
            # Continue current entity
            current_tokens.append(token)
        else:
            # End current entity
            if current_entity:
                entities.append({
                    'type': current_entity,
                    'text': tokenizer.convert_tokens_to_string(current_tokens)
                })
            current_entity = None
            current_tokens = []
    
    # Don't forget last entity
    if current_entity:
        entities.append({
            'type': current_entity,
            'text': tokenizer.convert_tokens_to_string(current_tokens)
        })
    
    return entities

# Example usage
text = "bật đèn phòng khách lúc 7 giờ tối"
entities = extract_entities(text)
print(f"Input: {text}")
print(f"Entities: {entities}")

Using Pipeline

from transformers import pipeline

# Load NER pipeline
ner = pipeline(
    "token-classification",
    model="ntgiaky/phobert-ner-smart-home",
    aggregation_strategy="simple"
)

# Extract entities
result = ner("tắt quạt phòng ngủ sau 10 phút")
print(result)

Integration with Intent Classification

For a complete NLU pipeline:

from transformers import AutoTokenizer, AutoModelForTokenClassification, pipeline

# Load with PhoBERT tokenizer explicitly
tokenizer = AutoTokenizer.from_pretrained("vinai/phobert-base")
model = AutoModelForTokenClassification.from_pretrained(
    "ntgiaky/phobert-ner-smart-home",
    ignore_mismatched_sizes=True  # Add this if needed
)

# Create pipeline with explicit tokenizer
ner = pipeline(
    "token-classification",
    model=model,
    tokenizer=tokenizer,
    aggregation_strategy="simple"
)

# Test
result = ner("bật đèn phòng khách")
print(result)

Example Outputs

# Input: "bật đèn phòng khách"
# [{'entity_group': 'living_space', 'score': np.float32(0.97212785), 'word': 'đèn', 'start': None, 'end': None},
# {'entity_group': 'duration', 'score': np.float32(0.9332844), 'word': 'phòng khách', 'start': None, 'end': None}]

Citation

If you use this model, please cite:

@misc{phobert-ner-smart-home-2025,
  author = {Trần Quang Huy and Nguyễn Trần Gia Kỳ},
  title = {PhoBERT Fine-tuned for Vietnamese Smart Home NER},
  year = {2025},
  publisher = {Hugging Face},
  journal = {Hugging Face Model Hub},
  howpublished = {\url{https://huggingface.co/ntgiaky/ner-smart-home}}
}

Authors

  • Trần Quang Huy
  • Nguyễn Trần Gia Kỳ

License

This model is released under the MIT License.

Downloads last month
12
Safetensors
Model size
134M params
Tensor type
I64
·
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Evaluation results