Abstract: Many references support the non-invasive detection of aberrant speech using machine learning function descriptors and classifiers. Deep learning with feature descriptors and time frequency images is a better option. The majority of deep learning frameworks for speech-language pathology use a binary classification model. A network that can identify accurate medical conditions is required to construct a hardware system. It is essential to do a serious examination of time-frequency analysis using advanced deep learning algorithms. Current research is focused on creating a non-invasive, dependable, and computationally expensive architecture for detecting multiclass laryngeal lesions. In a realistic scenario application, compare the performance of a fully linked network versus a completely collapsed deep learning voice denoiser network. Three alternative time- frequency picture corpora are generated in the noise reduction training example.For applying in a realistic environment, the capability of a fully-connected network and a fully convolutional deep-learning voice denoiser network is initially investigated. Denoised training samples are used to create three different time-frequency image corpuses. These multivariate image datasets are used to train three upgraded forms of the state-of-the-art convolution neuron network model using a 3D convolution kernel.
Keywords: Deep neural network , pathology classification , non invasive , random forest algorithm , speech language pathology , CNN algorithm , binary classification model voice dataset , pre processing , vocal issues , detection or identification.
| DOI: 10.17148/IJARCCE.2023.12224