Solved – What are the senones in a Deep Neural Network

deep learninghidden markov modelnatural languageneural networksterminology

I am reading this paper: skype translator where they use CD-DNN-HMMs (Context dependent Deep neural Networks with Hidden Markov Models). I can understand the idea of the project and the architecture they've designed but I don't get what are the senones. I have been looking for a definition but I haven't found anything

—We propose a novel context-dependent (CD) model for
large-vocabulary speech recognition (LVSR) that leverages recent
advances in using deep belief networks for phone recognition. We
describe a pre-trained deep neural network hidden Markov model
(DNN-HMM) hybrid architecture that trains the DNN to produce
a distribution over senones (tied triphone states) as its output

Please if you could give me an explanation about this I would really appreciate it.

EDIT:

I've found this definition in this paper:

We propose
to model subphonetic events with Markov states and treat the
state in phonetic hidden Markov models as our basic subphonetic
unit — senone. A word model is a concatenation
of state-dependent senones and senones can be shared across
different word models.

I guess they are used in the Hidden Markov Model part of the architecture in the first paper. Are they the states of the HMM? The outputs of the DNN?

Best Answer

"Senones" was named by me in 1992. See my ICASSP 1992 paper[1]. It's just a fancy name for a cluster of shared Markov states, representing similar acoustic events. It came from the contrast with IBM's fenones, where the "f" means "frame" and my "s" means "state".

The initial idea came from my 1991 Eurospeech (now called Interspeech) work, where I used top-down clustering on Markov states. You can find my CMU tech report in 1991 here: https://www.semanticscholar.org/paper/Shared-distribution-hidden-Markov-models-for-speech-Hwang-Huang/33ea989f1655636162b7e9b8e0cfe3fcce92c37d

In 1992, I decided to move it to tree clustering so that one could model unseen cd phones as well.

[1] Hwang, Mei-Yuh, and Xuedong Huang. "Subphonetic modeling with Markov states-Senone." In [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 33-36. IEEE, 1992. link: https://ieeexplore.ieee.org/document/225979