Abstract: Hate Speech is any correspondence that decries an individual or a gathering based on some trademark, for example, race, identity, sex, sexual direction, ethnicity, religion, or other trademark. Harmful language (e.g., scorn discourse, damaging discourse, or other hostile discourse) principally targets individuals from minority gatherings and can catalyze genuine savagery towards them. The paper proposes an improve framework for hate speech detection using machine learning approach. This system uses a twitter dataset that contains tweeted messages of both hate speech, offensive language, and also messages that is neither hate speech nor offensive language. The dataset was downloaded from kaggle.com, the dataset contains a total of 24,784 twitted messages. The dataset is made up of 8 columns which we later reduced it to two columns by means of feature_extraction. The reduced columns are the tweet columns which contain the twitted messages and the class columns which contains 0,1 and 2, where 0 is classified as hate speech, 1 is classified as offensive language and 2 is classified as neither hate speech or offensive language. we trained our model using support vector machine and random forest classifier and had an accuracy of 95% and 99%. We then deployed our model to web using python flask for easy evaluation and testing. Our experimental results show that our proposed system had better performance in terms of classifying text as hate speech.
Keywords: Hate Speech, Offensive Language, Random Forest Classifier, Support Vector Machine, Machine Learning
| DOI: 10.17148/IJARCCE.2021.10332