--- library_name: sklearn tags: - sklearn - skops - tabular-classification - finance model_format: pickle model_file: skops-ise057qg.pkl widget: - structuredData: Bene_Country: - COMOROS - CANADA - MOROCCO Sender_Country: - SRI-LANKA - USA - USA Transaction_Type: - MOVE-FUNDS - PAY-CHECK - MAKE-PAYMENT USD_amount: - 598.31 - 398.72 - 87.03 --- # Model description This is a Gaussian Naive Bayes model trained on a synthetic dataset, containining a large variety of transaction types representing normal activities as well as abnormal/fraudulent activities generated by J.P. Morgan AI Research. The model predicts whether a transaction is normal or fraudulent. ## Intended uses & limitations For educational purposes ## Training Procedure The data preprocessing steps applied include the following: - Dropping high cardinality features. This includes Transaction ID, Sender ID, Sender Account, Beneficiary ID, Beneficiary Account, Sender Sector - Dropping no variance features. This includes Sender LOB - Dropping Time and date feature since the model is not time-series based - Transforming and Encoding categorical features namely: Sender Country, Beneficiary Country, Transaction Type, and the target variable, Label - Applying feature scaling on all features - Splitting the dataset into training/test set using 85/15 split ratio - Handling imbalanced dataset using imblearn framework and applying RandomUnderSampler method to eliminate noise which led to a 2.5% improvement in accuracy ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6662300a0ad8c45a1ce59190/BEi0CfOfJ2ytxD5VoN4IM.png) ### Hyperparameters
Click to expand | Hyperparameter | Value | |----------------------------------------------|---------------------------------------------------------------------------| | memory | | | steps | [('preprocessorAll', ColumnTransformer(remainder='passthrough',
transformers=[('cat',
Pipeline(steps=[('onehot',
OneHotEncoder(handle_unknown='ignore',
sparse_output=False))]),
['Sender_Country', 'Bene_Country',
'Transaction_Type']),
('num',
Pipeline(steps=[('scale', StandardScaler())]),
Index(['USD_amount'], dtype='object'))])), ('classifier', GaussianNB())] | | verbose | False | | preprocessorAll | ColumnTransformer(remainder='passthrough',
transformers=[('cat',
Pipeline(steps=[('onehot',
OneHotEncoder(handle_unknown='ignore',
sparse_output=False))]),
['Sender_Country', 'Bene_Country',
'Transaction_Type']),
('num',
Pipeline(steps=[('scale', StandardScaler())]),
Index(['USD_amount'], dtype='object'))]) | | classifier | GaussianNB() | | preprocessorAll__n_jobs | | | preprocessorAll__remainder | passthrough | | preprocessorAll__sparse_threshold | 0.3 | | preprocessorAll__transformer_weights | | | preprocessorAll__transformers | [('cat', Pipeline(steps=[('onehot',
OneHotEncoder(handle_unknown='ignore', sparse_output=False))]), ['Sender_Country', 'Bene_Country', 'Transaction_Type']), ('num', Pipeline(steps=[('scale', StandardScaler())]), Index(['USD_amount'], dtype='object'))] | | preprocessorAll__verbose | False | | preprocessorAll__verbose_feature_names_out | True | | preprocessorAll__cat | Pipeline(steps=[('onehot',
OneHotEncoder(handle_unknown='ignore', sparse_output=False))]) | | preprocessorAll__num | Pipeline(steps=[('scale', StandardScaler())]) | | preprocessorAll__cat__memory | | | preprocessorAll__cat__steps | [('onehot', OneHotEncoder(handle_unknown='ignore', sparse_output=False))] | | preprocessorAll__cat__verbose | False | | preprocessorAll__cat__onehot | OneHotEncoder(handle_unknown='ignore', sparse_output=False) | | preprocessorAll__cat__onehot__categories | auto | | preprocessorAll__cat__onehot__drop | | | preprocessorAll__cat__onehot__dtype | | | preprocessorAll__cat__onehot__handle_unknown | ignore | | preprocessorAll__cat__onehot__max_categories | | | preprocessorAll__cat__onehot__min_frequency | | | preprocessorAll__cat__onehot__sparse | deprecated | | preprocessorAll__cat__onehot__sparse_output | False | | preprocessorAll__num__memory | | | preprocessorAll__num__steps | [('scale', StandardScaler())] | | preprocessorAll__num__verbose | False | | preprocessorAll__num__scale | StandardScaler() | | preprocessorAll__num__scale__copy | True | | preprocessorAll__num__scale__with_mean | True | | preprocessorAll__num__scale__with_std | True | | classifier__priors | | | classifier__var_smoothing | 1e-09 |
### Model Plot
Pipeline(steps=[('preprocessorAll',ColumnTransformer(remainder='passthrough',transformers=[('cat',Pipeline(steps=[('onehot',OneHotEncoder(handle_unknown='ignore',sparse_output=False))]),['Sender_Country','Bene_Country','Transaction_Type']),('num',Pipeline(steps=[('scale',StandardScaler())]),Index(['USD_amount'], dtype='object'))])),('classifier', GaussianNB())])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
## Evaluation Results | Metric | Value | |----------|----------| | accuracy | 0.794582 | ## Model Explainability SHAP was used to determine the important features that helps the model make decisions ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6662300a0ad8c45a1ce59190/rQYxJoz86TtdkSSSGnCOr.png) ### Confusion Matrix ![Confusion Matrix](confusion_matrix.png) # Model Card Authors This model card is written by following authors: Seifullah Bello