COMPARISON OF THE AUTOMATIC SPEAKER RECOGNITION PERFORMANCE OVER STANDARD FEATURES

  • Milan Dobrović Telekom Srbija
  • Vlado Delić Fakultet tehničkih nauka, Univerzitet u Novom Sadu
  • Nikša Jakovljević Fakultet tehničkih nauka, Univerzitet u Novom Sadu
  • Ivan Jokić Fakultet tehničkih nauka, Univerzitet u Novom Sadu
Keywords: Automatic Speaker Recognition, Gaussian Mixture Models, Mel-Frequency Cepstral Coefficients, Linear Prediction Coefficients, Perceptual Linear Prediction, Hidden Markov Model, HTK

Abstract

This paper presents a study of speaker recognition accuracy depending on the choice of features, window width and model complexity. The standard features were considered, such as linear and perceptual prediction coefficients (LPC and PLP) and mel-frequency cepstral coefficients (MFCC). In addition, the application of Heteroscedastic Linear Discriminant Analysis (HLDA) was examined, in order to increase the difference between speaker models. Gaussian mixture model (GMM), with the use of HTK tools, was chosen for speaker modelling. Thirty speakers from the speech database S70W100s120 were used for system training and testing. It showed better system performance using MFCC and PLP features. Application of HLDA in most cases helped improve the accuracy, while that improvement was less as the accuracy of the reference system (the one with the same features without the use of HLDA) was higher.
Published
2019-01-15
Section
Articles