Abstract: The increase in the amount of information available online has made it a difficult job to search for unique documents and relevant information. This in turn also creates a problem to find the correct required piece of information with a lot of time spent in browsing and searching. And we come up with a faster and more accurate way to make training easier in this process. Today, with the upcoming e-learning trends, the system not only aims to provide efficient searches but also provides the desired information to a consolidated material and provides various tools such as useful links and videos to make our system a reliable and comprehensive source for the user's needs. A process series, namely data cleaning, tokenization, and frequency calculation and log-likelihood function, is performed on all articles in this method after collecting the data of the article. In contrast to measuring the byte structure, it considers the capacity of Natural Language so that more accuracy can be obtained during the measure of similarity. During the estimation of text frequency, it performs pre-processing and inverse report frequency. Pre-processing is performed using a standard set of stop words that also provides almost accurate measurements. The following framework makes it possible to use the different data mining techniques to pre-process a report before uploading it and to identify the main topic covered by the post. Later on, searching we get the article whose match for feature vector of the search query is the highest; this is another step into the world of e-learning.
Keywords: Data Cleaning, e-Learning, Tokenization, Natural Language
| DOI: 10.17148/IJARCCE.2019.81010