Abstract—Language resources are essential to the development of text-to-speech (TTS) systems. The goal of TTS technology Text- to-Speech conversion is no longer to simply make machines talk, but to make them sound like people of different ages and genders. The quality of TTS systems synthesizers is evaluated from a variety of perspectives, including intelligibility, naturalness, and preference of the synthesized speech, as well as human perception factors, such as comprehensibility. TTS using concatenative TTS relies on high-quality audio clips, which are then combined to form the speech. At the first step voice used for searching is recorded manually and stored in system for further checking and conversion from a range of speech units. The transformation is done from whole sentences to syllables that are further labeled and segmented by linguistic units from phones to phrases and sentences forming a huge database. During speech synthesis, a Text-to-Speech engine searches such database for speech units that match the input text, concatenates them together and produces an audio file in the same directory , which contains the final output.
Index Terms—Concatenative speech synthesis, Unit Size, Syl- labification,Spectral Noise Reduction, Satistical Parametric Syn- thesis model.
| DOI: 10.17148/IJARCCE.2022.11575