Spam Web Page Detection based on Content and Link Structure of the Site
Abstract: Spam Web page is a website which does not contain any useful information. Spammer will create such spam pages for fun or to increase page rank in turn to generate their revenue. The Spam webpage Detection is one of the top challenges for the search engines. There are two different approaches for the detection of spam web page such as Link and Content based analysis. In this paper, we mainly focus on Content based analysis. We have used parameters such as average length of a word, keyword stuffing, and content of a body, number of stop words, unique count for body and title of page are used to identify spam.
Keywords: Search engine, Web mining, Spam web page, Content based analysis.
How to Cite:
[1] Kiran Hunagund, Santhosh Kumar K L, “Spam Web Page Detection based on Content and Link Structure of the Site,” International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), DOI: 10.17148/IJARCCE.2015.4875
