Could someone take a look on my attempt to compute the gradient for:
$$f(x) = \lambda \sum_{x = 1}^n g(x_i)$$
Where $x \in \mathbb{R^d}$, $\lambda \in \mathbb{R}$ and
$$g(x_i) = \begin{cases}
x_i – \varepsilon/2 & \textbf{if } |x_i| \geq \varepsilon\\
x_i^2 / (2\varepsilon) & \textbf{if } |x_i| < \varepsilon\\
\end{cases}$$
This is what I have done so far:
The function $g(x_i)$ is not differentiable if $x = -\varepsilon$, for the rest:
$$
\frac{\partial}{\partial\beta_i}\sum_{i=1}^n g(x_i)=
\begin{cases}
1&|x_i|\ge\epsilon\;,\\
x_i/\epsilon&|x_i|\lt\epsilon\;.
\end{cases}
$$
For $f(x)$ I would apply the product rule:
\begin{align*}
\frac{\partial}{\partial x} f(x) &= (\frac{\partial}{\partial x} \lambda) \cdot \sum_{x = 1}^n g(x_i) + \lambda \cdot (\frac{\partial}{\partial x} \sum_{x = 1}^n g(x_i))\\
&= 0 \cdot \sum_{x = 1}^n g(x_i) + \lambda \cdot (\frac{\partial}{\partial x} \sum_{x = 1}^n g(x_i))\\
&= \lambda \cdot \frac{\partial}{\partial x_i} \sum_{x = 1}^n g(x_i)
\end{align*}
If this is correct, then my question is of which domain is then $\frac{\partial}{\partial x} f(x)$?
Either it is $\mathbb{R}^n$ or $\mathbb{R}$. I am not sure, for the fact, that this is a gradient I would say $\mathbb{R}^n$, but how are then the components of the resulting vector computed?
$$\begin{pmatrix}
???\\
???\\
\vdots\\
???
\end{pmatrix}$$
Best Answer
First, $\frac{\partial f(x)}{\partial x_i} = \lambda g'(x_i)$, so the gradient is given by $\nabla f(x) = \lambda (g'(x_1),...,g'(x_n))^T$
Second, $g'(t) = 1$, when $|t|>\epsilon$, and $g'(t) = \frac{t}{\epsilon}$when $|t| < \epsilon$.
Hence the gradient is either $(\lambda,...,\lambda)^T$ if $||x||_{\infty} < \epsilon$, or $\frac{\lambda}{\epsilon} x$ if $x_i > \epsilon$, $\forall i$. For other $x$, you need to use the appropriate value depending on $x_i$.
No product rule is needed.
And $\nabla f(x) \in \mathbb{R}^n$.