I've run the system using the following for training: Speech data(NTIMIT) –> MFCC (feature extraction) –> GMM (modeling)
for testing:
Speech data(NTIMIT)–> MFCC (feature extraction) –> EM (scores)
the accuracy I am getting is 44% for 461 speakers. it was confirmed by 2 at least(1. Reynolds. 2. Patra) that running such system should give an accuracy of 60.8% for 630 speakers i have done lots of changes in terms of sampling frequency (mainly 8000 or 16000), number of MFCC cepstums, number of MFCC mixtures and iterations and the window size and that was the best percentage I could get.
I am using an MFCC and GMM codes which gave good result with TIMIT
advice would be really appreciated
Best Answer