Abstract: Phishing attacks persist as one of the most financially damaging cybersecurity threats, with recent global cybersecurity reports indicating a continuous year-over-year surge driven by large-scale automated phishing kits and AI- generated scam content. Standard reactive defenses like blacklists and static filters simply can’t keep up with modern threats, failing specifically against zero-day attacks that leverage fresh domains or complex social engineering. A workable hybrid approach that uses both URL patterns and email content to identify phishing is needed to close this gap. In order to detect threats without any blacklist entries, the firstlay er employs a logistic regression model – character level TF-IDF vectorization to identify malicious sequence of n-grams 3 to 5 characters.The second layer is an email phishing detection laye r that uses a Random Forest Classifier trained on a UCI Spam base dataset with 57 markers, including word frequencies and capitalization patterns, to identify spam email contents. To avoid false flagging and promptly identify reliable websites, a whitelist is utilized. Both models are managed by the system, which is implemented as a Flask web application. By identifying both phishing URLs and spam patterns, the training results demonstrate the system's high detection rate and low false positives.

Keywords: phishing detection, TF-IDF, logistic regression, random forest, spambase, cybersecurity, URL analysis, and email security


Downloads: PDF | DOI: 10.17148/IJARCCE.2026.15130

How to Cite:

[1] Prof. K Thriveni, Praveen K, Manoj Kumar, Sharan S, Nishchal Gowda B R, "A Dual-Model Machine Learning System for Phishing Detection: URL Pattern Recognition and Email Content Analysis," International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), DOI: 10.17148/IJARCCE.2026.15130

Open chat
Chat with IJARCCE