Transactor AIBA - Banking Transaction NER Model

Model Description

Transactor AIBA is a multilingual Named Entity Recognition (NER) model fine-tuned on google-bert/bert-base-multilingual-cased for extracting entities from banking and financial transaction texts. The model supports both English and Russian languages.

Intended Use

This model is designed to extract key entities from banking transaction requests, including:

Transaction amounts and currencies
Account numbers and bank codes
Tax identification numbers (INN)
Recipient/sender information
Transaction purposes
Dates and time periods

Entity Types

The model recognizes the following entity types:

amount
bank_code
currency
date
description
end_date
receiver_hr
receiver_inn
receiver_name
start_date
status

Training Data

Base Model: google-bert/bert-base-multilingual-cased
Training Samples: 200,015
Validation Samples: 35,297
Dataset: Custom banking transaction dataset with multilingual support

Training Details

Epochs: 5
Batch Size: 16
Learning Rate: 2e-5
Optimizer: AdamW
LR Scheduler: Linear with warmup
Framework: Transformers + PyTorch

Performance

Validation F1 Score: 0.9999

Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

# Load model and tokenizer
model_name = "primel/transactor-aiba"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

# Example prediction
def extract_entities(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
    
    with torch.no_grad():
        outputs = model(**inputs)
        predictions = torch.argmax(outputs.logits, dim=2)
    
    tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'][0])
    predicted_labels = [model.config.id2label[pred.item()] for pred in predictions[0]]
    
    entities = {}
    current_entity = None
    current_tokens = []
    
    for token, label in zip(tokens, predicted_labels):
        if token in ['[CLS]', '[SEP]', '[PAD]']:
            continue
            
        if label.startswith('B-'):
            if current_entity and current_tokens:
                entity_text = tokenizer.convert_tokens_to_string(current_tokens)
                entities[current_entity] = entity_text.strip()
            current_entity = label[2:]
            current_tokens = [token]
        elif label.startswith('I-') and current_entity == label[2:]:
            current_tokens.append(token)
        else:
            if current_entity and current_tokens:
                entity_text = tokenizer.convert_tokens_to_string(current_tokens)
                entities[current_entity] = entity_text.strip()
            current_entity = None
            current_tokens = []
    
    if current_entity and current_tokens:
        entity_text = tokenizer.convert_tokens_to_string(current_tokens)
        entities[current_entity] = entity_text.strip()
    
    return entities

# Example
text = "Transfer 12.5mln USD to Apex Industries account 27109477752047116719"
print(extract_entities(text))

Example Outputs

Input: "Transfer 12.5mln USD to Apex Industries account 27109477752047116719 INN 123456789 bank code 01234 for consulting"

Output:

{
    "amount": "12.5mln",
    "currency": "USD",
    "receiver_name": "Apex Industries",
    "receiver_hr": "27109477752047116719",
    "receiver_inn": "123456789",
    "receiver_bank_code": "01234",
    "purpose": "consulting"
}

Limitations

The model is trained on synthetic and curated banking transaction data
Performance may vary on real-world data with different formatting
Best results are achieved with transaction texts similar to training distribution
May require fine-tuning for specific banking systems or regional variations

License

Apache 2.0

Citation

@misc{transactor-aiba,
  author = {Primel},
  title = {Transactor AIBA: Multilingual Banking Transaction NER},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/primel/transactor-aiba}}
}

Downloads last month: 168

Safetensors

Model size

0.2B params

Tensor type

F32