📞 +91-7667918914 | ✉️ ijarcce@gmail.com
IJARCCE Logo
International Journal of Advanced Research in Computer and Communication Engineering A monthly Peer-reviewed & Refereed journal
ISSN Online 2278-1021ISSN Print 2319-5940Since 2012
IJARCCE adheres to the suggestive parameters outlined by the University Grants Commission (UGC) for peer-reviewed journals, upholding high standards of research quality, ethical publishing, and academic excellence.
← Back to VOLUME 13, ISSUE 4, APRIL 2024

Image Captioning using CNN and Transformers

K Lakshmipathi Raju, Venkat Rayidu, P Surendra, V Sai Satish, M. Sai Harsha

DOI: 10.17148/IJARCCE.2024.13469

Abstract: Image captioning involves automatically describing images using words, attracting attention from researchers in natural language processing (NLP) and computer vision. Recent advancements primarily adopt an encoder-decoder framework, utilizing convolutional neural networks (CNNs) to extract image features and decoders to generate descriptions. Integration of attention mechanisms into this framework has notably improved performance. Leveraging the Transformer model, known for its effectiveness and efficiency in NLP tasks due to its attention mechanisms, we propose a novel approach combining CNNs and Transformers for image captioning. Our model utilizes a Transformer-Encoder to extract refined image feature representations, enabling the Transformer-Decoder to focus on pertinent image details when generating captions. Additionally, adaptive attention in the Transformer-Decoder determines the optimal utilization of image information during caption generation. Through extensive training on the Flickr8K_dataset, our model achieves an impressive 86.21% accuracy, demonstrating its efficancy and value in image captioning tasks.

Keywords: Image Caption, CNN, Deep Learning, Transformer, Attention mechanism,Flickr8k dataset.

How to Cite:

[1] K Lakshmipathi Raju, Venkat Rayidu, P Surendra, V Sai Satish, M. Sai Harsha, “Image Captioning using CNN and Transformers,” International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), DOI: 10.17148/IJARCCE.2024.13469