Abstract: This study presents a method to identify articles written in Vietnamese on the internet that contain reactionary viewpoints against the Government of Vietnam and the leadership of the Communist Party of Vietnam. These articles often comprise various errors such as spelling mistakes, typos, misplaced punctuation marks, new and unfamiliar “terms” to Vietnamese people, etc. Hence, it is not appropriate to apply grammatical and vocabulary analysis methods. We propose to use the word orders in triplet form (Subject, Verb, Object) and its variables including doublet form (Subject, Predicate, null) or (Verb, Object, null), and singulet form (Subj, null, null) to screen these articles in accordance with the following principle: if one article has at least one sentence containing the elements of such word orders, the article will be considered as containing reactionary viewpoints. The original triplets are established based on the training corpus (dataset), and then extended using the synonyms in VietWordNet. The extension of triplets is able to increase the accuracy of this algorithm significantly. The Program can help professional security units to reduce human resources and enhance operational effectiveness.

Keywords: document analysis, document classification, reactionary viewpoint, triple, edge triplet, triplet finding, Spark GraphX, VietWordNet.

PDF | DOI: 10.17148/IJARCCE.2021.10434

Open chat
Chat with IJARCCE