Abstract: Deep learning has greatly enhanced computer vision by enabling models to extract complex features from large datasets. Utilizing Convolutional Neural Networks (CNNs) and modern architectures, significant progress has been made in object detection and human attribute recognition. This paper presents a real-time Flask-based web system that detects persons using YOLOv8, estimates their age and gender via pre-trained Caffe models, identifies clothing color through K-Means clustering, and calculates distance in steps using geometric estimation based on object height. The system processes live webcam video streams and provides verbal feedback through a text-to-speech engine, enhancing accessibility for visually impaired users. By integrating computer vision with audio feedback, the solution offers a practical and intelligent assistant for real-world scenarios. The system achieves reliable performance with an overall accuracy of 94.44%.
Keywords: Deep Learning, Computer Vision, YOLOv8, Flask, Face Detection, Age and Gender Prediction, Clothing Color Detection, Distance Estimation, Text-to-Speech, Accessibility, Real-Time System.
|
DOI:
10.17148/IJARCCE.2025.14531