← Back to VOLUME 15, ISSUE 5, MAY 2026
This work is licensed under a Creative Commons Attribution 4.0 International License.
Hybrid Embedding Model for Document Classification
Ranjana S. Chakrasali, Chandana A. Athreyesa, K. Shridevi B. Adiga, Vathsala, Vishnupriya
π 3 viewsπ₯ 1 download
Abstract: Managing large collections of digital documents has become increasingly difficult in academic and professional environments. Files such as research papers, reports, PDFs, and project documents are often stored without proper organization, making retrieval slow and inefficient. This work proposes a hybrid document classification framework that combines TF-IDF statistical features with contextual embeddings generated using BERT. The combined representation helps the model capture both important keywords and semantic meaning from documents. A lightweight classification layer is used to assign uploaded files into categories such as Business, Politics, Sports, Health, and Technology. In addition, a rule-based file extension classifier is integrated to improve efficiency for commonly identifiable file types. A Flask-based web interface enables users to upload documents and automatically organize them into category folders. Experimental evaluation on the BBC News dataset demonstrates that the proposed hybrid model performs better than standalone TF-IDF and BERT models in terms of classification accuracy and Macro F1-score.
Keywords: document classification, hybrid embedding, TF-IDF, BERT, natural language processing, feature fusion, Flask, text categorization
Keywords: document classification, hybrid embedding, TF-IDF, BERT, natural language processing, feature fusion, Flask, text categorization
How to Cite:
[1] Ranjana S. Chakrasali, Chandana A. Athreyesa, K. Shridevi B. Adiga, Vathsala, Vishnupriya, βHybrid Embedding Model for Document Classification,β International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), DOI: 10.17148/IJARCCE.2026.155165
