Exemplifying the Significance of Tuning Tf-Idf for Sentiment Mining Online Consumer Review

Nandhini.S; S.Prema

doi:10.17148/IJARCCE.2018.71102

← Back to VOLUME 7, ISSUE 11, NOVEMBER 2018

Exemplifying the Significance of Tuning Tf-Idf for Sentiment Mining Online Consumer Review

Nandhini.S, Dr.S.Prema

Downloads: Download PDF|DOI: 10.17148/IJARCCE.2018.71102

👁 29 views📥 1 download

Abstract: Text mining have gain huge momentum in recent years, with user-generated content becoming widely available. One keyuse is remark mining, with much attention being given to sentiment analysis and opinion mining. An essential step in the process of comment mining is text pre-processing; a step in which each linguistic term is assigned with a weight that commonly increase with its appearance in the studied text, yet is offset by the occurrence of the term in the domain of interest. A common practice is to use the well-known tf-idf formula to calculate these weights.This paper reveals the bias introduce by between-participants’ discourse to the study of comments in social media, and proposes an adjustment. We find that content extract from discourse is often highly correlated, resulting in dependence structures between observations in the study, thus introducing a statistical bias. Ignoring this bias can obvious in a non-robust analysis at best and can lead to an entirely wrong conclusion at worst. We propose a change to tf-idf that accounts for this bias. We show the effects of both the bias and correction with seven Facebook fan pages data, covering different domains, including news, finance, politics, sport, shopping, and entertainment.

Keywords: Sentiment Analysis, Text Mining, Statistical Bias, Discourse, TF-IDF

How to Cite:

[1] Nandhini.S, Dr.S.Prema, “Exemplifying the Significance of Tuning Tf-Idf for Sentiment Mining Online Consumer Review,” International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), DOI: 10.17148/IJARCCE.2018.71102

This work is licensed under a Creative Commons Attribution 4.0 International License.