[Math] Neural Networks – Are these functions Lipschitz continuous

functionslipschitz-functions

Assuming for simplicity a neural network with 1 parameter. Let $x \in R$ be a training pattern, $t \in R$ the target variable, $w \in R$ the parameter and $g: R \rightarrow R$ the activation function. Given the regularized loss function:

$$ f(x;w) = \frac{1}{2}(t – g(xw))^2 + \frac{1}{2} \lambda w^2$$

The activation function $g$ can be linear, sigmoid, tanh or ReLU.

Depending on the choice of $g$, are $f$, $\nabla f$ and $\nabla^2 f$ Lipschitz-continuous?

ps:

I need to check whether some assumptions of optimization algorithms are true, so I think they require global Lipschitz continuity.

Best Answer

I answer my question assuming a linear activation function $g(wx) = wx$ using the theorem:

$f: I \rightarrow R$ is Lipschitz in I if and only if $\nabla f$ is bounded in I.

$$f(x;w) = \frac{1}{2} (t - wx)^2 + \frac{1}{2} \lambda w^2$$

Deriving with respect to $w$...

$$\nabla f(x;w) = - tx + w(x^2 + \lambda) $$ $$\nabla^2 f(x;w) = x^2 + \lambda$$ $$\nabla^3 f(x;w) = 0$$

So $\nabla f$ is not bounded as it goes to $\infty$, therefore $f$ is not Lipschitz. $\nabla^2 f$ is a constant (because $x$ and $\lambda$ are fixed), therefore $\nabla f$ is Lipschitz. Likewise $\nabla^2 f$ is Lipschitz because the derivative is bounded.

I hope at least this is correct and that there is a method to check the others less explicitly.