Transactor AIBA - Banking Transaction NER Model

Model Description

Transactor AIBA is a multilingual Named Entity Recognition (NER) model fine-tuned on google-bert/bert-base-multilingual-cased for extracting entities from banking and financial transaction texts. The model supports both English and Russian languages.

Intended Use

This model is designed to extract key entities from banking transaction requests, including:

  • Transaction amounts and currencies
  • Account numbers and bank codes
  • Tax identification numbers (INN)
  • Recipient/sender information
  • Transaction purposes
  • Dates and time periods

Entity Types

The model recognizes the following entity types:

  • amount
  • bank_code
  • currency
  • date
  • description
  • end_date
  • receiver_hr
  • receiver_inn
  • receiver_name
  • start_date
  • status

Training Data

  • Base Model: google-bert/bert-base-multilingual-cased
  • Training Samples: 200,015
  • Validation Samples: 35,297
  • Dataset: Custom banking transaction dataset with multilingual support

Training Details

  • Epochs: 5
  • Batch Size: 16
  • Learning Rate: 2e-5
  • Optimizer: AdamW
  • LR Scheduler: Linear with warmup
  • Framework: Transformers + PyTorch

Performance

  • Validation F1 Score: 0.9999

Usage

from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch

# Load model and tokenizer
model_name = "primel/transactor-aiba"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)

# Example prediction
def extract_entities(text):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
    
    with torch.no_grad():
        outputs = model(**inputs)
        predictions = torch.argmax(outputs.logits, dim=2)
    
    tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'][0])
    predicted_labels = [model.config.id2label[pred.item()] for pred in predictions[0]]
    
    entities = {}
    current_entity = None
    current_tokens = []
    
    for token, label in zip(tokens, predicted_labels):
        if token in ['[CLS]', '[SEP]', '[PAD]']:
            continue
            
        if label.startswith('B-'):
            if current_entity and current_tokens:
                entity_text = tokenizer.convert_tokens_to_string(current_tokens)
                entities[current_entity] = entity_text.strip()
            current_entity = label[2:]
            current_tokens = [token]
        elif label.startswith('I-') and current_entity == label[2:]:
            current_tokens.append(token)
        else:
            if current_entity and current_tokens:
                entity_text = tokenizer.convert_tokens_to_string(current_tokens)
                entities[current_entity] = entity_text.strip()
            current_entity = None
            current_tokens = []
    
    if current_entity and current_tokens:
        entity_text = tokenizer.convert_tokens_to_string(current_tokens)
        entities[current_entity] = entity_text.strip()
    
    return entities

# Example
text = "Transfer 12.5mln USD to Apex Industries account 27109477752047116719"
print(extract_entities(text))

Example Outputs

Input: "Transfer 12.5mln USD to Apex Industries account 27109477752047116719 INN 123456789 bank code 01234 for consulting"

Output:

{
    "amount": "12.5mln",
    "currency": "USD",
    "receiver_name": "Apex Industries",
    "receiver_hr": "27109477752047116719",
    "receiver_inn": "123456789",
    "receiver_bank_code": "01234",
    "purpose": "consulting"
}

Limitations

  • The model is trained on synthetic and curated banking transaction data
  • Performance may vary on real-world data with different formatting
  • Best results are achieved with transaction texts similar to training distribution
  • May require fine-tuning for specific banking systems or regional variations

License

Apache 2.0

Citation

@misc{transactor-aiba,
  author = {Primel},
  title = {Transactor AIBA: Multilingual Banking Transaction NER},
  year = {2025},
  publisher = {Hugging Face},
  howpublished = {\url{https://huggingface.co/primel/transactor-aiba}}
}
Downloads last month
168
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support