Abstract: Deep Learning methodologies offer great potential for applications that automatically attempt to generate captions or descriptions for images and video frames. With the recent advancements in neural networks, there has been progress in implementing object detection or generating description for images and captions for videos. Our work aims to automatically generate list of objects in the image when image is given and generate caption to video when video is given by reading their content. At present images and videos are annotated with Human intervention and it is difficult or almost an impossible task for a large commercial database to manually caption every photo and video. Image Object Detection and Video Captioning is basically very much useful in many applications like for generating captions or description during real-time and theyare also being used in advance machine and deep learning applications.

Keywords: Deep learning, Object detection, Neural networks, Video captioning.

PDF | DOI: 10.17148/IJARCCE.2022.11726

