A Comprehensive Investigation on Image Caption Generation using Deep Neural Networks

Main Article Content

Haraprasad Naik
Saswati Tripathy
Laxmi Priya
Prajna Paramita Sahu


Caption Generation from an Image is a task that has close relationship with Object Detection
and Natural Language Processing. However, The Object Detection has been evolving tremendously
since a decade. In the Traditional approach of Object Detection there is an involvement of three step
process, which includes (1) region selection (2) Feature Extraction and (3) Classification. But in current
days research trends Neural Networks are used to overcome the hand-crafted feature extraction along
with the application of classification algorithm through various algorithm such as SVM, AdaBoost and
Deformable Part Model (DPM). Similarly, to generate a Caption we can take the help of Neural Network
specifically Recurrent Neural Network (RNN). In most of the Caption Generator a variation of RNN
is used, that is Long Short-Term Memory (LSTM). Many researchers adopts either sampling or Beam
Search to generate a valid sentence/caption. In this paper our aim is to capture the fundamental idea
behind the presently available Image Caption Generator and compare their architectural design with
their performance.

Article Details

How to Cite
Naik, H., Tripathy, S. ., Laxmi Priya, & Sahu, P. P. (2022). A Comprehensive Investigation on Image Caption Generation using Deep Neural Networks. INFOCOMP Journal of Computer Science, 21(1). Retrieved from https://infocomp.dcc.ufla.br/index.php/infocomp/article/view/1495