Abstract: News online has become the major source of information for people, much information appearing on the internet is dubious and even intended to mislead. Automated fake news detection tools like machine learning and deep learning models have become an essential requirement also used stemming, lemmatization, stop word techniques to obtain text representation for machine learning and deep learning models respectively. We use Kaggle dataset, for defining the fake news. This would allow to provide a filtered subset of fake news to end users. The advent of the World Wide Web and the rapid adoption of social media platforms (such as Facebook and Twitter) paved the way for information dissemination that has never been witnessed in the human history before. With the current usage of social media platforms, consumers are creating and sharing more information than ever before, some of which are misleading with no relevance to reality. Automated classification of a text article as misinformation or disinformation is a challenging task. Even an expert in a particular domain has to explore multiple aspects before giving a verdict on the truthfulness of an article. In this work, we propose to use machine learning ensemble approach for automated classification of news articles. Our study explores different textual properties that can be used to distinguish fake contents from real. By using those properties, we train a combination of different machine learning algorithms using various ensemble methods and evaluate their performance on 4 real world datasets. Experimental evaluation confirms the superior performance of our proposed ensemble learner approach in comparison to individual learners. Along with the data, our understanding of AI also increases and the computing power enables us to train very complex and large models faster. Fake news has been gathering a lot of attention worldwide recently. The effects can be political, economic, organizational, or even personal. This paper discusses the approach of natural language processing and machine learning in order to solve this problem. Use of bag-of-words, n-grams, count vectorizer has been made, TF-IDF, and trained the data on five classifiers to investigate which of them works well for this specific dataset of labelled news statements. The precision, recall and f1 scores help us determine which model works best.

Keywords: Fake news analysis, real news, Keywords Internet, social media, Fake News, Classification, Machine Learning.


PDF | DOI: 10.17148/IJARCCE.2022.11541

Open chat
Chat with IJARCCE