Hybrid Intelligence for DDoS Defense: Combining Generative AI, Resampling, and Ensemble Methods

Hybrid Intelligence for DDoS Defense: Combining Generative AI, Resampling, and Ensemble Methods

Lakshmi Prayaga (University of West Florida, USA), Chandra Prayaga (University of West Florida, USA), Rhys Misstle (University of West Florida, USA), Mariah Zuanazzi (University of West Florida, USA), and Sri Satya Harsha Pola (University of West Florida, USA)
DOI: 10.4018/IJAIML.370316
Article PDF Download
Open access articles are freely available for download

Abstract

Recent advances in machine learning, deep learning, and large language models enable the design of refined and complex algorithms to detect and prevent cybersecurity attacks. In this paper, we present a hybrid fusion approach combining Generative AI, ADASYN, Recursive Feature Elimination (RFE), and boosting algorithms to detect DDoS attacks. RFE was employed to optimize feature selection, enhancing model interpretability and performance by reducing dimensionality. The proposed model leverages (1) Packet Capture (pcap) data generated from virtual networks as real data, (2) synthetic data generated by the Synthetic Data Vault, (3) ADASYN to balance the data, and (4) boosting algorithms for training and testing. The results obtained from this hybrid-fusion model provided an accuracy of 97–98%, indicating that the model is robust and reliable. Cross-validation of the model further validated the results
Article Preview
Top

Introduction

Distributed denial of service (DDoS) attacks are a type of cybersecurity threat that compromises multiple systems using malware. These attacks typically involve overwhelming a target server with high requests, leading to severe service disruptions. By exhausting the bandwidth and computational resources, DDoS attacks render systems unavailable for legitimate users. Their effects include service interruptions, revenue loss, reputational damage, and increased operational costs, making detecting and mitigating DDoS attacks a critical priority for organizations.

To address these challenges, we introduce a robust framework for DDoS detection. Our approach combines the following processes:

  • generating synthetic data using a variational autoencoder (VAE) synthesizer

  • capturing real data from a virtual network consisting of a server and two clients

  • balancing data with Synthetic Minority Oversampling Technique (SMOTE) and TOMEK-LINK (SMOTETomek)

  • optimizing features through recursive feature elimination (RFE)

This hybrid method achieved high accuracy rates, demonstrating its effectiveness in distinguishing DDoS attacks from normal traffic.

By integrating synthetic and real data, balancing skewed datasets, and leveraging feature elimination techniques, we provide a scalable, reliable framework for detecting malicious network activity. The findings affirm the validity of this approach and underscore its potential to mitigate cyberattacks that can cause significant operational and financial losses. This work contributes to the field by offering an innovative pipeline for anomaly detection and infrastructure protection in high-dimensional datasets. In the rest of the paper we include a literature review, an overview of our methodology, a discussion on results and an interpretation of findings, and a conclusion and recommendations for future work.

Top

Literature Review

Advances in machine learning, deep learning, and large language models have provided open-source libraries and tools that significantly enhance the ability to detect and mitigate cyberattacks. Several studies have demonstrated the potential of synthetic data generated by generative adversarial networks (GANs) and VAEs to augment datasets when real data are scarce, imbalanced, unreliable, or skewed (Khakurel et al., 2022; Mehrabi et al., 2021). The use of synthetic data generated from labeled data allows for training robust models and improving classification outcomes. Some studies (Chalé & Bastian, 2022; Nikolov, 2023) have shown that combining synthetic and real data can achieve results comparable to using real data alone, whereas models trained only on synthetic data tend to underperform. However, other researchers (Halvorsen & Gebremedhin, 2024; Llugiqi & Mayer, 2022) have reported that data models trained exclusively on synthetic data perform equally well, or in some cases better, than models trained on real data. Enhanced feature extraction has also been shown to improve anomaly detection speed and accuracy (Patil et al., 2022; Wang et al., 2022).

Machine learning algorithms are commonly used to evaluate the accuracy of methods for detecting various types of cybercrime. For instance, Kilincer et al. (2022) and Oneto and Chiappa (2020) used Light Gradient-Boosting Machine (LightGBM) and Extreme Gradient Boosting (XGBoost) on the Comprehensive Cyber Security Intrusion Detection Dataset (CCiDD) and its subsets, CCiDD_A and CCiDD_B. Their findings revealed that LightGBM outperformed XGBoost in detecting cyberattacks within these datasets. Similarly, Louk and Tama (2023) and Chen et al. (2023) reported that ensemble methods such as gradient boosting machine, XGBoost, LightGBM, and CatBoost were effective for intrusion detection. Among these, CatBoost consistently achieved superior performance in identifying cyberattacks.

Complete Article List

Search this Journal:
Reset
Volume 14: 1 Issue (2025)
Volume 13: 1 Issue (2024)
Volume 12: 2 Issues (2022)
Volume 11: 2 Issues (2021)
Volume 10: 2 Issues (2020)
Volume 9: 2 Issues (2019)
View Complete Journal Contents Listing