Abstract: Image captioning is a task that tries to generate captions for the given photographs by combing computer vision and natural language processing. It’s a two-step process in which precise image recognition and appropriate syntactic and semantic language comprehension. Due to the rising amount of information accessible on this subject, keeping up with the newest research and findings in the field of picture captioning is becoming increasingly difficult.
Current research in the field is mostly focused on deep learning-based methods, with attention mechanisms, deep reinforcement, and adversarial learning appearing to be at the forefront. In this paper we will go through various research papers which focuses on deep learning models and uses COCO dataset or Flicker dataset.

