International Journal of Advanced Research in Computer and Communication Engineering

A monthly peer-reviewed online and print journal

ISSN Online 2278-1021
ISSN Print 2319-5940

Since 2012

Abstract: Phishing refers is the process whereby an attacker pretends to be a legitimate one for the purpose of getting vital information such as personal information, credit card details and confidential passwords from user. Phishing are usually done through websites Urls, emails, text messages and phone calls.  Once they successfully acquire user’s vital information, they used it in gaining access to the user’s account which can to financial theft and loss. This paper presents a model in detecting phishing websites using support vector classifier and a deep neural network algorithm. We used a urlset dataset which comprises of 48,009 legitimate website Urls and 48,009 phishing Urls making a total of 98,019 websites Urls. The dataset was pre-processed by removing all Nan and finite values therefore making it clean and fit for training. After processing, we used feature extraction in deducting the dataset dimension and some unwanted feature columns thereby reducing the dataset from 16 feature columns to 2 feature columns; with the domain feature column (this holds the domain name/website Urls) and the label feature column (this holds the binary values 0 and 1, where 0 represent a legitimate website Url and 1 represent a phishing website). We also used CountVectorizer in converting text documents (domain column) to a vector of term/token counts. CountVectorizer also enables the ​pre-processing of text data prior to generating the vector representation. After training, support vector classifier showed that the result of accuracy was 97.21% while our deep learning algorithm was 98.33% of the total 98,018 url dataset studied. Thereafter we saved and deployed both models to web using flask

Keywords: Phishing, Support Vector Classifier, Deep Neural Network, Machine Learning

PDF | DOI: 10.17148/IJARCCE.2020.9632

Open chat
Chat with IJARCCE