📞 +91-7667918914 | ✉️ ijarcce@gmail.com
IJARCCE Logo
International Journal of Advanced Research in Computer and Communication Engineering A monthly Peer-reviewed & Refereed journal
ISSN Online 2278-1021ISSN Print 2319-5940Since 2012
IJARCCE adheres to the suggestive parameters outlined by the University Grants Commission (UGC) for peer-reviewed journals, upholding high standards of research quality, ethical publishing, and academic excellence.
← Back to VOLUME 5, ISSUE 3, MARCH 2016

Effective Crawling In Web Forum

D. Dhivya , R. Venkadeshan , D. Vidhya

DOI: 10.17148/IJARCCE.2016.53239

Abstract: The Main Objective of EC web is to crawl relevant forum content from the web with minimal overhead. Forum threads contain information content that is the target of forum crawlers. Each forum have different layouts or styles and are powered by different forum software packages, they always have similar implicit navigation paths connected by specific URL types to lead users from entry pages to thread pages. We reduce the web forum crawling problem to a URL-type recognition problem. Training sets are created by learning accurate and effective regular expression patterns. We have applied this knowledge on unseen URL�s and identified the type of that URL. After the classification all crawled URL�s are stored in a log. URL log is used to identify strong and weak URL�s by eliminating the duplicate URL�s from the URL log. Effectiveness of the strong URL will be measured finally.



Keywords: Effective Crawling, Web Forum, URL Type Recognition Module, Crawling Module.

How to Cite:

[1] D. Dhivya , R. Venkadeshan , D. Vidhya, “Effective Crawling In Web Forum,” International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), DOI: 10.17148/IJARCCE.2016.53239