Abstract: The rapid growth of digital documents has been increasing exponentially in various domains and the effective retrieval of documents has become a challenging task. Traditional keyword search could not understand the real meaning of words. It only looks for matching words, so many times the results are not correct. It also misses the context of the question. Because of this, the answers are not accurate. Now, new technologies like Generative Artificial Intelligence and Natural Language Processing make this work easier and better.
The objective of our project is to develop an intelligent document querying system that enables efficient question-answering by integrating document retrieval methods with Large Language Models through a Retrieval Augmented Generation (RAG) framework. This system focuses on context-aware answer generation by including semantic search techniques rather than typical keyword-based retrieval.
Our proposed system supports multiple document formats ingestion like PDFs, Text files or Docs. These documents are divided into multiple segments known as embeddings and are stored in a Vector Database which enables faster retrieval. When a user submits a query, relevant segments of the content are forwarded to the Language Model for accurate answer generation.
The system is implemented using Python for backend development, vector indexing techniques for semantic retrieval, pretrained embedding models for representation learning, and Large Language Models for answer generation. The modular architecture ensures scalability and allows the system to be adapted to different document domains with minimal modification.
The expected outcome of this project is a reliable question answering system, improving reliability, reducing ambiguity and minimizing hallucinated outputs usually associated with regular Language Models. The proposed system can be effectively deployed in educational institutions, research environments, legal documentation systems, and enterprise knowledge management platforms to support intelligent, data-driven decision making.
Index Terms: Retrieval-Augmented Generation, Generative Artificial Intelligence, Natural Language Processing, Semantic Search, Vector Database, Document Question Answering, Large Language Models
Downloads:
|
DOI:
10.17148/IJARCCE.2026.15235
[1] M. Ayyappa Chakravarthi, Yayavaram Raja Sri, Moparthi Asha, Shaik Samirin Kousar, Tammuluri Reena Prashanthi, "Retrieval-Augmented Document Querying and Context-Aware Answer Generation Using Vector Indexing and Large Language Models," International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), DOI: 10.17148/IJARCCE.2026.15235