In Goodfellow, Bengio, and Courville's book Deep Learning they state some variation of the following at several points (, 10.2.3):

In general, we can think of the neural network as representing a function $f(x; \theta)$. The outputs of this function are not direct predictions of the value $y$. instead, $f(x; \theta)$ = $\omega$ provides the parameters for a distribution over $y$. Our loss function can then be interpreted as $-\log p(y;\omega(x))$. [p. 182]

I take this to mean that although a basic, standard neural network (i.e. feedforward with one output unit) outputs a single value, the training process and the suggested use of the maximum likelihood principle mean that inference is equivalent to sampling from some probability distribution whose structure is specified by the input and the network's parameters.

Am I interpreting this correctly? If so, what exactly is the relationship between the probability distribution and the network or the network's parameters? What is the relationship between $\omega$ and the network or the network's parameters?

Or are the distribution and parameters $\omega$ just hypotheticals and not meant to be interpreted so literally?

The specific part that you quote is dealing with the scenario when your network is not trained to predict the value of $y$ (which is the most common use-case of neural networks these days), but it predicts the parameters of the distribution of $y$. Let's say that we know that $p(y|x)$ is distributed normally, but we don't know parameters of the Gaussian. However, we know they depend on $x$ (if $x$ is age, maybe $y$ has higher variance in some age groups). So instead of letting $y=f(x,\theta)$, we let $\omega=f(x, \theta)$ and $y\sim \mathcal{N}(\omega)$.

As stated in the book, you can train such model using the maximum likelihood principle. The trained model gives you the parameters of the distribution that allow you evaluating $p(y)$.

To answer your question:

what exactly is the relationship between the probability distribution and the network or the network's parameters? What is the relationship between $\omega$ and the network or the network's parameters ($\theta$)?

Network parameters $\theta$ parametrize the network function $f(x; \theta)$. Output of this function, $f(x;\theta)=\omega$, are the parameters of the distribution of $y$. In the example case I gave above, it is the mean and the variance of the normal distribution.

