Transactor AIBA - Banking Transaction NER Model
Model Description
Transactor AIBA is a multilingual Named Entity Recognition (NER) model fine-tuned on google-bert/bert-base-multilingual-cased for extracting entities from banking and financial transaction texts. The model supports both English and Russian languages.
Intended Use
This model is designed to extract key entities from banking transaction requests, including:
- Transaction amounts and currencies
- Account numbers and bank codes
- Tax identification numbers (INN)
- Recipient/sender information
- Transaction purposes
- Dates and time periods
Entity Types
The model recognizes the following entity types:
amountbank_codecurrencydatedescriptionend_datereceiver_hrreceiver_innreceiver_namestart_datestatus
Training Data
- Base Model:
google-bert/bert-base-multilingual-cased - Training Samples: 200,015
- Validation Samples: 35,297
- Dataset: Custom banking transaction dataset with multilingual support
Training Details
- Epochs: 5
- Batch Size: 16
- Learning Rate: 2e-5
- Optimizer: AdamW
- LR Scheduler: Linear with warmup
- Framework: Transformers + PyTorch
Performance
- Validation F1 Score: 0.9999
Usage
from transformers import AutoTokenizer, AutoModelForTokenClassification
import torch
# Load model and tokenizer
model_name = "primel/transactor-aiba"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForTokenClassification.from_pretrained(model_name)
# Example prediction
def extract_entities(text):
inputs = tokenizer(text, return_tensors="pt", truncation=True, max_length=128)
with torch.no_grad():
outputs = model(**inputs)
predictions = torch.argmax(outputs.logits, dim=2)
tokens = tokenizer.convert_ids_to_tokens(inputs['input_ids'][0])
predicted_labels = [model.config.id2label[pred.item()] for pred in predictions[0]]
entities = {}
current_entity = None
current_tokens = []
for token, label in zip(tokens, predicted_labels):
if token in ['[CLS]', '[SEP]', '[PAD]']:
continue
if label.startswith('B-'):
if current_entity and current_tokens:
entity_text = tokenizer.convert_tokens_to_string(current_tokens)
entities[current_entity] = entity_text.strip()
current_entity = label[2:]
current_tokens = [token]
elif label.startswith('I-') and current_entity == label[2:]:
current_tokens.append(token)
else:
if current_entity and current_tokens:
entity_text = tokenizer.convert_tokens_to_string(current_tokens)
entities[current_entity] = entity_text.strip()
current_entity = None
current_tokens = []
if current_entity and current_tokens:
entity_text = tokenizer.convert_tokens_to_string(current_tokens)
entities[current_entity] = entity_text.strip()
return entities
# Example
text = "Transfer 12.5mln USD to Apex Industries account 27109477752047116719"
print(extract_entities(text))
Example Outputs
Input: "Transfer 12.5mln USD to Apex Industries account 27109477752047116719 INN 123456789 bank code 01234 for consulting"
Output:
{
"amount": "12.5mln",
"currency": "USD",
"receiver_name": "Apex Industries",
"receiver_hr": "27109477752047116719",
"receiver_inn": "123456789",
"receiver_bank_code": "01234",
"purpose": "consulting"
}
Limitations
- The model is trained on synthetic and curated banking transaction data
- Performance may vary on real-world data with different formatting
- Best results are achieved with transaction texts similar to training distribution
- May require fine-tuning for specific banking systems or regional variations
License
Apache 2.0
Citation
@misc{transactor-aiba,
author = {Primel},
title = {Transactor AIBA: Multilingual Banking Transaction NER},
year = {2025},
publisher = {Hugging Face},
howpublished = {\url{https://huggingface.co/primel/transactor-aiba}}
}
- Downloads last month
- 168