Comparative Analysis of Machine Learning Algorithms for Anomaly Detection in IoT Networks Using CICIoT2023 Dataset
Main Article Content
Abstract
Internet of Things (IoT) networks face increasing security threats due to their heterogeneous
nature and resource constraints. This study presents a comprehensive comparison of ten machine learning
algorithms for anomaly detection in IoT environments using the CICIoT2023 dataset. We evaluated six
supervised learning algorithms (Logistic Regression, Random Forest, Gradient Boosting, Linear SVC,
SGD Classifier, and MLP) and four unsupervised anomaly detection methods (Isolation Forest, SGD
One-Class SVM, Local Outlier Factor, and Elliptic Envelope) using a reproducible pipeline with Data
Version Control (DVC). Our methodology employs stratified sampling on 4.5 million records (97.7%
attacks, 2.3% benign), standardized preprocessing with 39 features, and binary classification. The ex-
perimental framework includes rigorous statistical validation through 705 experiments across multiple
hyperparameter configurations with 5 independent runs each. Given severe class imbalance, balanced
accuracy emerged as the critical metric, with ensemble methods (Gradient Boosting: 91.95%, Random
Forest: 91.89%) demonstrating 8-17 percentage point advantage over linear classifiers in minority class
detection. Gradient Boosting achieved highest F1-score (0.9964 ± 0.0004), while SGD-based methods
provided 200-600× faster training with competitive performance, suitable for resource-constrained de-
ployments. Bayesian statistical analysis confirmed significant performance differences across algorithm
families. This research establishes a rigorous baseline for algorithm selection in severely imbalanced IoT
intrusion detection systems.
Article Details
Upon receipt of accepted manuscripts, authors will be invited to complete a copyright license to publish the paper. At least the corresponding author must send the copyright form signed for publication. It is a condition of publication that authors grant an exclusive licence to the the INFOCOMP Journal of Computer Science. This ensures that requests from third parties to reproduce articles are handled efficiently and consistently and will also allow the article to be as widely disseminated as possible. In assigning the copyright license, authors may use their own material in other publications and ensure that the INFOCOMP Journal of Computer Science is acknowledged as the original publication place.