XGBOOST With WORD2VEC Framework For Text Categorization

Amandu Manoj; Perika Nikhil; Battemekala Sai kumar; A. Naresh Kumar

doi:10.17148/IJARCCE.2025.14459

XGBOOST With WORD2VEC Framework For Text Categorization

Amandu Manoj, Perika Nikhil, Battemekala Sai kumar, A. Naresh Kumar

Abstract: Automatic text classification is an important task of natural language processing (NLP) along with sentiment analysis, data organization, and spam filtering applications. Traditional methods based on bag-of-words (BOW) or TF-IDF methods often struggle to capture the relationship between words. This limitation can lead to misclassification, especially for short or ambiguous texts. The synergy of word embedding techniques for text enhancement. Word2Vec converts a word into number vectors that capture semantic similarities and relationships. By feeding these vectors to XGBoost, the model can use the rich semantic information to make further category predictions. Word2Vec captures the relationship between words, allowing the model to understand context and distinguish between words. Consider words like “king” and “queen” that have similar numbers, but “king” and “bank” have different numbers. This improves classification compared to traditional methods. The combination of Word2Vec and XGBoost can handle noisy or incomplete files better than traditional methods. Word2Vec’s dense representation helps reduce the impact of misspellings or inconsistent content, increasing the power of real-world applications. Additionally, XGBoost’s ability to handle missing values and focus on the most important features improves model interpretation. The framework can be extended to multiple registration functions, making it adaptable to a wide range of text challenges. Finally, XGBoost’s scalability ensures that the method can be effectively applied to large datasets without sacrificing performance

Index Terms: XGBoost, Word2Vec, text categorization, semantic relationships, NLP, gradient boosting, word embeddings, accuracy, machine learning, document classification.

| DOI: 10.17148/IJARCCE.2025.14459

International Journal of Advanced Research in Computer and Communication Engineering

XGBOOST With WORD2VEC Framework For Text Categorization

Call for Papers

Author Center

IJARCCE Management

Archives