Comparison of Different Encoder Techniques in Image Caption

Pooja Negi; Sanjay Buch

doi:10.17148/IJARCCE.2022.11810

International Journal of Advanced Research in Computer and Communication Engineering

A monthly Peer-reviewed & Refereed journal

ISSN Online 2278-1021
ISSN Print 2319-5940

Since 2012

Comparison of Different Encoder Techniques in Image Caption

Pooja Negi, Sanjay Buch

Abstract: Image captioning, has been one of the most intriguing topics in deep learning. It incorporates the knowledge of both image processing and natural language processing. Most of the current approaches integrate the concepts of neural network. Many pre-defined convolutional neural network (CNN) models are used for extracting features of an image and bi-directional or uni-directional recurrent neural network (RNN) for sentence creation as decoder. This paper discusses about the commonly used models that are used as image encoder, such as VGG16, VGG19, Inception-V3 and InceptionResNetV2 while using the uni-directional LSTMs for the sentence generation. The comparative analysis of the result has been obtained using the BLEU score on the Flickr8k dataset.

Keywords: Image Captioning, CNN, LSTM, BLEU

| DOI: 10.17148/IJARCCE.2022.11810