Abstract: Machine learning is a field of artificial intelligence that allows computers to improve their performance on specific tasks by learning from data instead of just following predefined rules. It encompasses supervised, unsupervised, and semi-supervised learning, which depend on the availability of labeled data. Phishing detection employs machine learning to extract features, classify URLs, adapt to new threats, and analyze data to detect clusters of phishing attack. It plays a crucial role in automating the detection of phishing attacks. The proliferation of malicious websites and internet criminal activities have raised concerns among web users and service providers. To address this issue, we propose a learning-based algorithms such as Catboost, Adaboost, Random Forest and Support vector machine to classify websites based on the URLs into three categories: benign, spam, and malicious. Benign websites offer legitimate services and are safe to use, while spam websites inundate users with ads, fake surveys, or dating sites. Malicious websites are created by attackers with the intent of disrupting computer operations, stealing confidential data, or gaining unauthorized access to private systems. The proposed mechanism analyzes only the Uniform Resource Locator (URL) of websites and does not access their content, reducing runtime latency and eliminating the possibility of exposing users to browser-based vulnerabilities. A large dataset of labeled URLs is used to train a classification model, which is then used to classify new URLs. After experimentally evaluating the proposed approach using a publicly available dataset, it is demonstrated that the approach achieves 98.3% accuracy, with the random forest model and SVM model, outperforming traditional blacklisting services in generality and coverage, and having the ability to adapt to new threats and enhance its performance over time. It presenting a promising solution for accurately detecting phishing attacks in URLs and of interest to researchers and practitioners working in cybersecurity.
Keywords: Detecting phishing attacks for cybersecurity that use Catboost, Adaboost, Random Forest, Support Vector Machine algorithms, Natural Language Processing and Machine Learning.
| DOI: 10.17148/IJARCCE.2023.12504