Comparison of Different Encoder Techniques in Image Caption

Pooja Negi; Sanjay Buch

doi:10.17148/IJARCCE.2022.11810

Comparison of Different Encoder Techniques in Image Caption

Pooja Negi, Sanjay Buch

Abstract: Image captioning, has been one of the most intriguing topics in deep learning. It incorporates the knowledge of both image processing and natural language processing. Most of the current approaches integrate the concepts of neural network. Many pre-defined convolutional neural network (CNN) models are used for extracting features of an image and bi-directional or uni-directional recurrent neural network (RNN) for sentence creation as decoder. This paper discusses about the commonly used models that are used as image encoder, such as VGG16, VGG19, Inception-V3 and InceptionResNetV2 while using the uni-directional LSTMs for the sentence generation. The comparative analysis of the result has been obtained using the BLEU score on the Flickr8k dataset.

Keywords: Image Captioning, CNN, LSTM, BLEU

Downloads: | DOI: 10.17148/IJARCCE.2022.11810

How to Cite:

[1] Pooja Negi, Sanjay Buch, "Comparison of Different Encoder Techniques in Image Caption," International Journal of Advanced Research in Computer and Communication Engineering (IJARCCE), DOI: 10.17148/IJARCCE.2022.11810

International Journal of Advanced Research in Computer and Communication Engineering

Comparison of Different Encoder Techniques in Image Caption

Call for Papers

Author Center

IJARCCE Management

Archives