Files

Download

Download Full Text (791 KB)

Description

Credit card fraud detection faces two critical challenges: extreme class imbalance with less than 0.2% fraud cases in typical datasets, and strict privacy regulations that prevent financial institutions from sharing sensitive transaction data for collaborative model training. This research addresses these challenges by investigating privacy-preserving synthetic data generation approaches in federated learning settings. The study compares two methods: Federated-SMOTE, which employs secure cross-bank nearest neighbor discovery to generate synthetic fraud cases through interpolation with strong privacy guarantees (ε=0.3), and Federated-GAN, which uses generative adversarial networks with differential privacy to synthesize realistic fraud patterns with relaxed privacy constraints (ε=0.7). Using the European Credit Card dataset partitioned across five banks with non-IID distribution, both approaches were evaluated through federated learning with FedAvg aggregation over 10 communication rounds. Experimental results demonstrate that both privacy-preserving methods significantly outperform baseline approaches without data balancing, achieving 17-21% improvement in PR-AUC. Federated-GAN achieves the highest overall precision-recall balance (PR-AUC 0.827), while Federated-SMOTE provides the highest fraud detection rate (recall 0.777) with 2.3× stronger privacy protection. This research contributes a practitioner-ready comparative framework for financial institutions to select privacy-preserving synthetic data generation methods based on their specific regulatory constraints and performance priorities, enabling collaborative fraud detection without exposing sensitive customer transaction data.

Publication Date

2026

Privacy-Preserving Synthetic Data Generation for Federated Learning in Imbalanced Credit Card Fraud Detection: A Comparative Analysis of SMOTE vs. GAN Approaches

Share

COinS