---
library_name: sklearn
tags:
- sklearn
- skops
- tabular-classification
- finance
model_format: pickle
model_file: skops-ise057qg.pkl
widget:
- structuredData:
Bene_Country:
- COMOROS
- CANADA
- MOROCCO
Sender_Country:
- SRI-LANKA
- USA
- USA
Transaction_Type:
- MOVE-FUNDS
- PAY-CHECK
- MAKE-PAYMENT
USD_amount:
- 598.31
- 398.72
- 87.03
---
# Model description
This is a Gaussian Naive Bayes model trained on a synthetic dataset, containining a large variety of transaction types representing normal activities as well as
abnormal/fraudulent activities generated by J.P. Morgan AI Research. The model predicts whether a transaction is normal or fraudulent.
## Intended uses & limitations
For educational purposes
## Training Procedure
The data preprocessing steps applied include the following:
- Dropping high cardinality features. This includes Transaction ID, Sender ID, Sender Account, Beneficiary ID, Beneficiary Account, Sender Sector
- Dropping no variance features. This includes Sender LOB
- Dropping Time and date feature since the model is not time-series based
- Transforming and Encoding categorical features namely: Sender Country, Beneficiary Country, Transaction Type, and the target variable, Label
- Applying feature scaling on all features
- Splitting the dataset into training/test set using 85/15 split ratio
- Handling imbalanced dataset using imblearn framework and applying RandomUnderSampler method to eliminate noise which led to a 2.5% improvement in accuracy
![image/png](https://cdn-uploads.huggingface.co/production/uploads/6662300a0ad8c45a1ce59190/BEi0CfOfJ2ytxD5VoN4IM.png)
### Hyperparameters
Click to expand
| Hyperparameter | Value |
|----------------------------------------------|---------------------------------------------------------------------------|
| memory | |
| steps | [('preprocessorAll', ColumnTransformer(remainder='passthrough',
transformers=[('cat',
Pipeline(steps=[('onehot',
OneHotEncoder(handle_unknown='ignore',
sparse_output=False))]),
['Sender_Country', 'Bene_Country',
'Transaction_Type']),
('num',
Pipeline(steps=[('scale', StandardScaler())]),
Index(['USD_amount'], dtype='object'))])), ('classifier', GaussianNB())] |
| verbose | False |
| preprocessorAll | ColumnTransformer(remainder='passthrough',
transformers=[('cat',
Pipeline(steps=[('onehot',
OneHotEncoder(handle_unknown='ignore',
sparse_output=False))]),
['Sender_Country', 'Bene_Country',
'Transaction_Type']),
('num',
Pipeline(steps=[('scale', StandardScaler())]),
Index(['USD_amount'], dtype='object'))]) |
| classifier | GaussianNB() |
| preprocessorAll__n_jobs | |
| preprocessorAll__remainder | passthrough |
| preprocessorAll__sparse_threshold | 0.3 |
| preprocessorAll__transformer_weights | |
| preprocessorAll__transformers | [('cat', Pipeline(steps=[('onehot',
OneHotEncoder(handle_unknown='ignore', sparse_output=False))]), ['Sender_Country', 'Bene_Country', 'Transaction_Type']), ('num', Pipeline(steps=[('scale', StandardScaler())]), Index(['USD_amount'], dtype='object'))] |
| preprocessorAll__verbose | False |
| preprocessorAll__verbose_feature_names_out | True |
| preprocessorAll__cat | Pipeline(steps=[('onehot',
OneHotEncoder(handle_unknown='ignore', sparse_output=False))]) |
| preprocessorAll__num | Pipeline(steps=[('scale', StandardScaler())]) |
| preprocessorAll__cat__memory | |
| preprocessorAll__cat__steps | [('onehot', OneHotEncoder(handle_unknown='ignore', sparse_output=False))] |
| preprocessorAll__cat__verbose | False |
| preprocessorAll__cat__onehot | OneHotEncoder(handle_unknown='ignore', sparse_output=False) |
| preprocessorAll__cat__onehot__categories | auto |
| preprocessorAll__cat__onehot__drop | |
| preprocessorAll__cat__onehot__dtype |
Pipeline(steps=[('preprocessorAll',ColumnTransformer(remainder='passthrough',transformers=[('cat',Pipeline(steps=[('onehot',OneHotEncoder(handle_unknown='ignore',sparse_output=False))]),['Sender_Country','Bene_Country','Transaction_Type']),('num',Pipeline(steps=[('scale',StandardScaler())]),Index(['USD_amount'], dtype='object'))])),('classifier', GaussianNB())])In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
Pipeline(steps=[('preprocessorAll',ColumnTransformer(remainder='passthrough',transformers=[('cat',Pipeline(steps=[('onehot',OneHotEncoder(handle_unknown='ignore',sparse_output=False))]),['Sender_Country','Bene_Country','Transaction_Type']),('num',Pipeline(steps=[('scale',StandardScaler())]),Index(['USD_amount'], dtype='object'))])),('classifier', GaussianNB())])
ColumnTransformer(remainder='passthrough',transformers=[('cat',Pipeline(steps=[('onehot',OneHotEncoder(handle_unknown='ignore',sparse_output=False))]),['Sender_Country', 'Bene_Country','Transaction_Type']),('num',Pipeline(steps=[('scale', StandardScaler())]),Index(['USD_amount'], dtype='object'))])
['Sender_Country', 'Bene_Country', 'Transaction_Type']
OneHotEncoder(handle_unknown='ignore', sparse_output=False)
Index(['USD_amount'], dtype='object')
StandardScaler()
[]
passthrough
GaussianNB()