Abstract: Object detection is the process of determining the presence, location, and type or class of at least one object using a bounding box. The person detection process produces a bounding box and allot a class label as a person based on YOLO v3. In YOLO v3 the features are learned, divides the image cells and each cell says a bounding box and entity classification directly. There could be more than one bounding box per person, but the system makes use of non-maximum suppression to reduce the number of bounding boxes to one per person. Finally, the number of persons in the image and video are calculated using the count of the bounding boxes. The dataset used for static pedestrian detection is the INRIA dataset and ShanghaiTech dataset. Yolo_Mark is used for marking bounding boxes of persons and gets its annotation files using 243 images from the INRIA dataset. Darknet is used as the framework for implementing YOLOv3. From INRIA Dataset 120 images are used for testing purposes. Testing on the INRIA dataset resulted in an accuracy of 96.1%. From the Shanghai tech-B, dataset 56 images are used for testing. Testing resulted in an accuracy of 87.3%.
Keywords: Yolo, CNN, CUDA.
| DOI: 10.17148/IJARCCE.2022.115172