LIP-TO-SPEECH SYNTHESIS USING MACHINE LEARNING

Abstract: Lip reading technology uses analysis of lip movement to record the speaker's message. It is used widely in many aspects of daily life. The performance of the entire lip-reading system is impacted by the dataset's quality. As a result, this study investigates the dataset for lipreading. Scikit video is used to extract frames from the source video. Idlib is then used to conduct facial detection. Lip cropping is accomplished by processing the feature points to obtain lip pictures. The dataset is then expanded by doing data augmentation. 33 voices are included in the collection, and each speaker's lips are represented by 7,000 images. A technique for creating datasets is suggested. Prior to decomposing the treated films in the Scikit-Video library.

Keywords: Lip reading, Idlib, Scikit video, lip pictures and lip cropping.

| DOI: 10.17148/IJARCCE.2023.12229

International Journal of Advanced Research in Computer and Communication Engineering

LIP-TO-SPEECH SYNTHESIS USING MACHINE LEARNING

Call for Papers

Author Center

IJARCCE Management

Archives