Abstract: Recently, the object classification in digital images and videos has been addressed in various research works. Convolutional neural networks (CNN) are effective to process image data, while long-short term memory (LSTM) networks are effective to process sequence data. However, when these two technologies are combined, the result is a solution to challenging computer vision problems, such as video classification. The process of image classification involves passing an image to a classifier, which can be either a trained CNN or a classical classifier and obtaining class predictions. An LSTM is engineered specifically to operate with a data sequence, as it considers all the previous inputs when producing an output. LSTMs are a form of neural network known as a Recurrent Neural Network (RNN). In general, RNNs are not known to be effective in addressing the long-term dependencies in the input sequence due to a problem known as the vanishing gradient problem. LSTMs are created to circumvent the vanishing gradient, thereby enabling an LSTM cell to retain context for lengthy input sequences. This paper is intended to rapidly design a novel image classification method by using Conv2D and LSTM algorithms. The number of filters, the size of the filters, the activation function, and the padding mode are among the numerous parameters that the Conv2D function accepts. The HiSVidClassifer method applied time-distributed Conv2D layers, followed by MaxPooling2D and Dropout layers. The Flatten layer is used to flatten the feature extracted from the Conv2D layers, which are then transmitted to an LSTM layer for analysis to classify and detect the object. Our HiSVidClassifer method is trained and evaluated on UCF50 dataset. It achieved outstanding results with low loss equals to 0.1935, and good accuracy equals to 95.08%, compared to ConvLSTM method which obtained loss equal to 0.3773 and the accuracy equal to 87.70%.


PDF | DOI: 10.17148/IJARCCE.2025.14104

Open chat
Chat with IJARCCE