Abstract: The paper presents an automated person counting system for video surveillance leveraging advanced deep learning techniques and computer vision. The system utilizes the YOLO (You Only Look Once) v3 model for efficient and accurate detection of persons in video frames. The YOLO model, pre-trained using COCO dataset, is employed to identify and locate persons within each frame by generating bounding boxes around detected individuals. To further refine the detection process, non-maximum suppression (NMS) is applied to eliminate redundant bounding boxes, ensuring each person is uniquely identified. Following detection, the VGG16 Convolutional Neural Network, trained using the famous ImageNet, is employed to extract deeper semantic features from respective detected person's region of interest (ROI). Identified features are essential for differentiating between unique individuals. The system processes video frames at specified intervals to balance computational efficiency and detection accuracy. To identify distinct individuals across the video, KMeans clustering is applied to the extracted features. The optimal number of clusters is determined empirically, representing the estimated number of unique individuals in the video. This clustering approach allows the system to compute the total number of distinct persons effectively. The implementation demonstrates a robust and scalable solution for automated person counting in surveillance videos, providing critical insights for security and monitoring applications. The system's ability to accurately detect and distinguish between individuals can enhance the effectiveness of surveillance operations, contributing to improved safety and situational awareness.

Keywords: YOLO v3, VGG16, KMeans, COCO dataset


PDF | DOI: 10.17148/IJARCCE.2024.13626

Open chat
Chat with IJARCCE