metadata

library_name: sklearn
tags:
  - sklearn
  - skops
  - tabular-classification
  - finance
model_format: pickle
model_file: skops-ise057qg.pkl
widget:
  - structuredData:
      Bene_Country:
        - COMOROS
        - CANADA
        - MOROCCO
      Sender_Country:
        - SRI-LANKA
        - USA
        - USA
      Transaction_Type:
        - MOVE-FUNDS
        - PAY-CHECK
        - MAKE-PAYMENT
      USD_amount:
        - 598.31
        - 398.72
        - 87.03

Model description

This is a Gaussian Naive Bayes model trained on a synthetic dataset, containining a large variety of transaction types representing normal activities as well as abnormal/fraudulent activities generated by J.P. Morgan AI Research. The model predicts whether a transaction is normal or fraudulent.

Intended uses & limitations

For educational purposes

Training Procedure

The data preprocessing steps applied include the following:

Dropping high cardinality features. This includes Transaction ID, Sender ID, Sender Account, Beneficiary ID, Beneficiary Account, Sender Sector
Dropping no variance features. This includes Sender LOB
Dropping Time and date feature since the model is not time-series based
Transforming and Encoding categorical features namely: Sender Country, Beneficiary Country, Transaction Type, and the target variable, Label
Applying feature scaling on all features
Splitting the dataset into training/test set using 85/15 split ratio
Handling imbalanced dataset using imblearn framework and applying RandomUnderSampler method to eliminate noise which led to a 2.5% improvement in accuracy

Hyperparameters

Click to expand

Hyperparameter	Value
memory
steps	[('preprocessorAll', ColumnTransformer(remainder='passthrough', transformers=[('cat', Pipeline(steps=[('onehot', OneHotEncoder(handle_unknown='ignore', sparse_output=False))]), ['Sender_Country', 'Bene_Country', 'Transaction_Type']), ('num', Pipeline(steps=[('scale', StandardScaler())]), Index(['USD_amount'], dtype='object'))])), ('classifier', GaussianNB())]
verbose	False
preprocessorAll	ColumnTransformer(remainder='passthrough', transformers=[('cat', Pipeline(steps=[('onehot', OneHotEncoder(handle_unknown='ignore', sparse_output=False))]), ['Sender_Country', 'Bene_Country', 'Transaction_Type']), ('num', Pipeline(steps=[('scale', StandardScaler())]), Index(['USD_amount'], dtype='object'))])
classifier	GaussianNB()
preprocessorAll__n_jobs
preprocessorAll__remainder	passthrough
preprocessorAll__sparse_threshold	0.3
preprocessorAll__transformer_weights
preprocessorAll__transformers	[('cat', Pipeline(steps=[('onehot', OneHotEncoder(handle_unknown='ignore', sparse_output=False))]), ['Sender_Country', 'Bene_Country', 'Transaction_Type']), ('num', Pipeline(steps=[('scale', StandardScaler())]), Index(['USD_amount'], dtype='object'))]
preprocessorAll__verbose	False
preprocessorAll__verbose_feature_names_out	True
preprocessorAll__cat	Pipeline(steps=[('onehot', OneHotEncoder(handle_unknown='ignore', sparse_output=False))])
preprocessorAll__num	Pipeline(steps=[('scale', StandardScaler())])
preprocessorAll__cat__memory
preprocessorAll__cat__steps	[('onehot', OneHotEncoder(handle_unknown='ignore', sparse_output=False))]
preprocessorAll__cat__verbose	False
preprocessorAll__cat__onehot	OneHotEncoder(handle_unknown='ignore', sparse_output=False)
preprocessorAll__cat__onehot__categories	auto
preprocessorAll__cat__onehot__drop
preprocessorAll__cat__onehot__dtype	<class 'numpy.float64'>
preprocessorAll__cat__onehot__handle_unknown	ignore
preprocessorAll__cat__onehot__max_categories
preprocessorAll__cat__onehot__min_frequency
preprocessorAll__cat__onehot__sparse	deprecated
preprocessorAll__cat__onehot__sparse_output	False
preprocessorAll__num__memory
preprocessorAll__num__steps	[('scale', StandardScaler())]
preprocessorAll__num__verbose	False
preprocessorAll__num__scale	StandardScaler()
preprocessorAll__num__scale__copy	True
preprocessorAll__num__scale__with_mean	True
preprocessorAll__num__scale__with_std	True
classifier__priors
classifier__var_smoothing	1e-09

Model Plot

Pipeline(steps=[('preprocessorAll',ColumnTransformer(remainder='passthrough',transformers=[('cat',Pipeline(steps=[('onehot',OneHotEncoder(handle_unknown='ignore',sparse_output=False))]),['Sender_Country','Bene_Country','Transaction_Type']),('num',Pipeline(steps=[('scale',StandardScaler())]),Index(['USD_amount'], dtype='object'))])),('classifier', GaussianNB())])

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Evaluation Results

Metric	Value
accuracy	0.794582

Model Explainability

SHAP was used to determine the important features that helps the model make decisions

Confusion Matrix

Model Card Authors

This model card is written by following authors: Seifullah Bello