Solved – What’s in a name: hyperparameters

definitionhyperparameterparameterizationterminology

So in a normal distribution, we have two parameters: mean $\mu$ and variance $\sigma^2$. In the book Pattern Recognition and Machine Learning, there suddenly appears a hyperparameter $\lambda$ in the regularization terms of the error function.

What are hyperparameters? Why are they named as such? And how are they intuitively different from parameters in general?

Best Answer

The term hyperparameter is pretty vague. I will use it to refer to a parameter that is in a higher level of the hierarchy than the other parameters. For an example, consider a regression model with a known variance (1 in this case)

$$ y \sim N(X\beta,I) $$

and then a prior on the parameters, e.g.

$$ \beta \sim N(0,\lambda I) $$

Here $\lambda$ determines the distribution of $\beta$ and $\beta$ determines the distribution for $y$. When I want to just refer to $\beta$ I may call it the parameter and when I want to just refer to $\lambda$, I may call it the hyperparameter.

The naming gets more complicated when parameters show up on multiple levels or when there are more hierarchical levels (and you don't want to use the term hyperhyperparameters). It is best if the author's specify exactly what is meant when they use the term hyperparameter or parameter for that matter.

Related Question