Vlado Delić
Fakultet tehničkih nauka, Univerzitet u Novom Sadu
Keywords:
Emotions in voice, pitch F0, intensity, duration
Abstract
The general aim of this paper is the research of posibilities of including emotions into synthesized speech. The goal was to compare the recorded voice of human speakers for several selected utterances in Arabic language, either with or without emotions, as well as to compare human utterances to synthesized speech obtained from an Arabic TTS system. In the experimental part of the paper several sentences are recorded with neutral utterances as well as with corresponding emotions. Then they were compared with each other and with synthesized speech of the same sentences. Speech features such as F0, duration and intensity were analyzed using PRAAT. Audio-visual analysis of recorded sentences with and without emotions has been conducted. The analysis of five emotions in natural and synthesized speech was presented: anger, joy, sadness, fear and surprise. The paper shows the differences in emotional and neutral speech that should be expressed in the synthesized speech as well. Moreover, some peculiarities of Arabic texts that are significant in the TTS process are also presented in the paper.