Abstract: This survey paper explores various techniques and methodologies used in the field of document processing and natural language processing. The paper examines different research studies and their contributions in addressing specific issues related to document processing and language understanding. The techniques discussed include template matching, image processing, deep learning algorithms such as YOLOv5 and BERT, optical character recognition (OCR), convolutional neural networks (CNNs), named entity recognition (NER), and machine translation. The survey paper highlights the challenges faced in manual invoice processing and proposes an automatic system based on key fields extraction from invoices. It also addresses the complexities of handling diverse document layouts, including invoices, purchase orders, and newspaper articles, using template-based, rule-based, and OCR techniques. Handwritten text recognition in South Indian languages is explored, considering the cursive and complex structure of handwriting and the unavailability of temporal information. The paper also focuses on the need for annotated datasets and the application of AI approaches in processing unstructured invoice documents. It discusses the utilization of image segmentation, OCR, and NLP for summarizing newspaper articles and efficient processing of unstructured documents using AI techniques. Additionally, the challenges of OCR performance in low-quality images and intelligent handwritten recognition are examined. Furthermore, the paper explores the application of NLP techniques such as named entity recognition, coreference resolution, relation extraction, and knowledge base reasoning for information extraction. It discusses the challenges and applications of NER in finance and biomedicine. The survey also investigates the use of deep learning models like BERT and transformers for semantic keyphrase extraction and presents a comprehensive overview of Indian language speech synthesis techniques. Finally, the paper explores the challenges in text-to-speech training, machine translation, and Indian regional language processing. It discusses the limitations of parallel training data for voice conversion and the lack of linguistic grounding in autoencoder-based voice conversion methods. The survey paper provides a comprehensive overview of the techniques, challenges, and advancements in the field of document processing and language understanding, paving the way for future research and development.
Keywords: Natural Language Processing (NLP), Convolutional neural networks (CNNs), non-native speakers, Optical Character Recognition (OCR), key insights, inclusivity, marginalized communities.
Cite:
S R Suresh, Shraddha C, Sai kiran, Sharmila Chidaravalli,"A Systematic Survey of Techniques for Document Processing and Natural Language Understanding", IJARCCE International Journal of Advanced Research in Computer and Communication Engineering, vol. 13, no. 1, 2024, Crossref https://doi.org/10.17148/IJARCCE.2024.13126.
| DOI: 10.17148/IJARCCE.2024.13126