Malware Detection Using PE Headers: A Comprehensive Machine Learning Approach

Pranto Bosu; Satinder Kaur; Tajbir Singh; Satveer Kour; Sandeep Kaur

doi:10.17148/IJARCCE.2026.15641

← Back to VOLUME 15, ISSUE 6, JUNE 2026

Malware Detection Using PE Headers: A Comprehensive Machine Learning Approach

Pranto Bosu, Satinder Kaur, Tajbir Singh, Satveer Kour, Sandeep Kaur

Downloads: Download PDF|DOI: 10.17148/IJARCCE.2026.15641

👁 6 views📥 1 download

Abstract: The rapid proliferation of malicious software (malware) poses an increasingly severe threat to global information infrastructure, with over 1.44 billion cumulative malware samples documented by 2024. Traditional signature-based antivirus solutions are demonstrably insufficient against zero-day threats, polymorphic code, and obfuscation-heavy payloads. This paper presents a comprehensive pre-review study of malware detection techniques that leverage the static structural features embedded within the Portable Executable (PE) file header — a rich, low-overhead source of discriminative information present in every Windows executable. We conduct an extensive literature survey of over 30 published works spanning 2017–2025 and propose an end-to-end detection pipeline that extracts 57 features across the DOS Header, File Header, Optional Header, and Section Table, and evaluates them using eight classification algorithms: Naive Bayes, SVM, KNN, Decision Tree, Random Forest, XGBoost, LightGBM, and MLP. Experimental analysis on the publicly available EMBER 2018 and Meraz’18 datasets shows that ensemble and gradient-boosting methods — specifically XGBoost (97.4% accuracy, AUC 0.987) and LightGBM (97.1%, AUC 0.985) — consistently outperform conventional classifiers. Feature importance analysis using SHAP reveals that SizeOfOptionalHeader, AddressOfEntryPoint, SizeOfImage, and NumberOfSections are the most discriminative attributes. We also discuss adversarial evasion challenges including packing, obfuscation, and header manipulation, and identify future research directions toward robust, explainable, and real-time PE-header-based malware detectors.

Keywords: Malware Detection, Portable Executable (PE) Header, Static Analysis, Machine Learning, XGBoost, Random Forest, LightGBM, Feature Engineering, Cybersecurity, EMBER Dataset.

How to Cite:

[1] Pranto Bosu, Satinder Kaur, Tajbir Singh, Satveer Kour, Sandeep Kaur, “Malware Detection Using PE Headers: A Comprehensive Machine Learning Approach,” International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), DOI: 10.17148/IJARCCE.2026.15641

This work is licensed under a Creative Commons Attribution 4.0 International License.