πŸ“ž +91-7667918914 | βœ‰οΈ ijarcce@gmail.com
International Journal of Advanced Research in Computer and Communication Engineering
International Journal of Advanced Research in Computer and Communication Engineering A monthly Peer-reviewed & Refereed journal
ISSN Online 2278-1021ISSN Print 2319-5940Since 2012
IJARCCE adheres to the suggestive parameters outlined by the University Grants Commission (UGC) for peer-reviewed journals, upholding high standards of research quality, ethical publishing, and academic excellence.
← Back to VOLUME 15, ISSUE 5, MAY 2026

Hybrid Embedding Model for Document Classification

Ranjana S. Chakrasali, Chandana A. Athreyesa, K. Shridevi B. Adiga, Vathsala, Vishnupriya

πŸ‘ 3 viewsπŸ“₯ 1 download
Share: 𝕏 f in ✈ βœ‰
Abstract: Managing large collections of digital documents has become increasingly difficult in academic and professional environments. Files such as research papers, reports, PDFs, and project documents are often stored without proper organization, making retrieval slow and inefficient. This work proposes a hybrid document classification framework that combines TF-IDF statistical features with contextual embeddings generated using BERT. The combined representation helps the model capture both important keywords and semantic meaning from documents. A lightweight classification layer is used to assign uploaded files into categories such as Business, Politics, Sports, Health, and Technology. In addition, a rule-based file extension classifier is integrated to improve efficiency for commonly identifiable file types. A Flask-based web interface enables users to upload documents and automatically organize them into category folders. Experimental evaluation on the BBC News dataset demonstrates that the proposed hybrid model performs better than standalone TF-IDF and BERT models in terms of classification accuracy and Macro F1-score.

Keywords: document classification, hybrid embedding, TF-IDF, BERT, natural language processing, feature fusion, Flask, text categorization

How to Cite:

[1] Ranjana S. Chakrasali, Chandana A. Athreyesa, K. Shridevi B. Adiga, Vathsala, Vishnupriya, β€œHybrid Embedding Model for Document Classification,” International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), DOI: 10.17148/IJARCCE.2026.155165

Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 International License.