this is just a comment to help you to see why $k$-linear functions on finite dimensional spaces are bounded.
to see that bilinear forms in finite dimensional spaces are bounded you can argue like that: (using your notation)
Let ${\bf x}=(x_1,\ldots,x_n)$ and ${\bf y}=(y_1,\ldots,y_m)$ be unit vectors in $\mathbb{R}^n$ and $\mathbb{R}^m$, respectively, then
$$
\left\|{\bf f}({\bf x},{\bf y})\right\|=\left\|\sum_{i=1}^n\sum_{j=1}^m x_iy_j\ {\bf f}(e_i,e_j)\right\|
\leq \max_{i,j}\ \|{\bf f}(e_i,e_j)\|\sum_{i=1}^n\sum_{j=1}^m (x^2_i+y^2_j).
$$
If we call $M=\max_{i,j}\ \|{\bf f}(e_i,e_j)\|$ we have from the above inequality that
$$
\|{\bf f}({\bf x},{\bf y})\| \leq M(m+n)\|{\bf x}\|^2\cdot \|{\bf y}\|^2
$$
Since ${\bf x}$ and ${\bf y}$ are unit vectors follows that ${\bf f}$ is bounded by $M(m+n)$. I hope you can extend this for any $k$-linear function.
For a multilinear mapping, it suffices to consider its Frechet derivative. Let $W$ be an $n$-D vector space, and each $V_i$ be an $m_i$-D vector space with $i=1,2,...,N$. Let $f:V_1\times V_2\times\cdots\times V_N\to W$ be multilinear. Then $\forall\left(v_1,v_2,...,v_N\right)\in V_1\times V_2\times\cdots\times V_N$, the Frechet derivative of $f$ at this location, denoted by $({\rm d}f)(v_1,v_2,...,v_N)$, is also a multilinear mapping, i.e.,
$$
({\rm d}f)(v_1,v_2,...,v_N):V_1\times V_2\times\cdots\times V_N\to W.
$$
According to Frechet, it follows that
\begin{align}
&({\rm d}f)(v_1,v_2,...,v_N)(h_1,h_2,...,h_N)\\
&=f(h_1,a_2,...,a_N)\\
&+f(a_1,h_2,...,a_N)\\
&+\cdots\\
&+f(a_1,a_2,...,h_N).
\end{align}
Recall that, if $g$ is linear, its entry-wise form reads
$$
g_i(v)=\sum_ja_{ij}v_j,
$$
and if $g$ is bilinear, its entry-wise form reads
$$
g_i(v_1,v_2)=\sum_{j_1,j_2}a_{ij_1j_2}v_{1j_1}v_{2j_2}.
$$
Inductively and formally, the above multilinear $f$ observes the following entry-wise form
$$
f_i(v_1,v_2,...,v_N)=\sum_{j_1=1}^{m_1}\sum_{j_2=1}^{m_2}\cdots\sum_{j_N=1}^{m_N}a_{ij_1j_2...j_N}v_{1j_1}v_{2j_2}...v_{Nj_N}
$$
for $i=1,2,...,m$, where each $v_{kj_k}$ denotes the $j_k$-th entry of $v_k\in V_k$, while $a_{ij_1j_2...j_N}$'s are the coefficients of $f$.
Thanks to this entry-wise form, we may then write down the entry-wise form of ${\rm d}f$ as well, which reads
\begin{align}
&({\rm d}f)_i(v_1,v_2,...,v_N)(h_1,h_2,...,h_N)\\
&=\sum_{j_1=1}^{m_1}\sum_{j_2=1}^{m_2}\cdots\sum_{j_N=1}^{m_N}a_{ij_1j_2...j_N}h_{1j_1}v_{2j_2}...v_{Nj_N}\\
&+\sum_{j_1=1}^{m_1}\sum_{j_2=1}^{m_2}\cdots\sum_{j_N=1}^{m_N}a_{ij_1j_2...j_N}v_{1j_1}h_{2j_2}...v_{Nj_N}\\
&+\cdots\\
&+\sum_{j_1=1}^{m_1}\sum_{j_2=1}^{m_2}\cdots\sum_{j_N=1}^{m_N}a_{ij_1j_2...j_N}v_{1j_1}v_{2j_2}...h_{Nj_N}.
\end{align}
In other words, as $a_{ij_1j_2...j_N}$'s are known, the entry-wise form of ${\rm d}f$ could be expressed straightforwardly as above.
Finally, the "$+$" in OP's original post, i.e., $(h_1+h_2+\cdots+h_N)$, is a convention in some context, which is exactly $(h_1,h_2,...,h_N)$ here. When there is free of ambiguity, both expressions can be used as per ones preference.
Best Answer
If $f \colon V \rightarrow \mathbb{R}^p$ then $Df \colon V \rightarrow \operatorname{Lin}(\mathbb{R}^n,\mathbb{R}^p)$ and so $D^2f \colon V \rightarrow \operatorname{Lin}(\mathbb{R}^n, \operatorname{Lin}(\mathbb{R}^n, \mathbb{R}^p))$. Let us try to unravel what this means.
First, note that a linear map $T \colon \mathbb{R}^n \rightarrow \operatorname{Lin}(\mathbb{R}^n, \mathbb{R}^p)$ is the same thing as a bilinear map $S \colon \mathbb{R}^n \times \mathbb{R}^n \rightarrow \mathbb{R}^p$. More precisely, we can define a map $\varphi \colon \operatorname{Lin}(\mathbb{R}^n, \operatorname{Lin}(\mathbb{R}^n, \mathbb{R}^p)) \rightarrow \operatorname{Lin}^2(\mathbb{R}^n, \mathbb{R}^p)$ by setting $\varphi(T)(v,w) := T(v)(w)$ and this map is an isomorphism. More generally, one can construct a similar identification
$$ \underbrace{\operatorname{Lin}(\mathbb{R}^n, \operatorname{Lin}(\mathbb{R}^n, \dots \, (\operatorname{Lin}(\mathbb{R}^n, \mathbb{R}^p) \, \dots )}_{k \text{ times}} \approx \operatorname{Lin}^k(\mathbb{R}^n, \mathbb{R}^p) $$
which allows you to identify the $k$-th derivative $D^kf|_{q}$ at a point $q$ (the underline notation is useful to differentiate the point and the vector parameters and to reduce the cluttering of paranthesis) with a $k$-multilinear map.
Now, consider the case where $p = 1$ and so $f$ is a scalar function. The first derivative $Df \colon V \rightarrow \operatorname{Lin}(\mathbb{R}^n,\mathbb{R}) = \left( \mathbb{R}^{n} \right)^{*}$ sends each point $q \in V$ to a linear functional $(Df)(q) = Df|_{q}$ (the underline notation keeps track at which point we are working with and reduces clutter of parenthesis) which acts as a directional derivative:
$$ (Df|_{q})(v) = \lim_{t \to 0} \frac{f(q + tv) - f(q)}{t}. $$
In particular, if we take $v = e_i$ (where $(e_1,\dots,e_n)$ is the standard basis vector of $\mathbb{R}^n$ we get $(Df|_{q})(e_i) = \frac{\partial f}{\partial x^i}(q) = \frac{\partial f}{\partial x^i}|_{q}$. Thus, if we represent each $Df|_{q}$ by a (row) vector $(Df|_{q}(e_1), \dots, Df|_{q}(e_n))$, we have $Df "=" \nabla f$ and we recover the usual notion of a gradient of a function.
Let us move to the second derivative. By the identification above, we can think of $D(Df)(q) = D(Df)|_{q} = D^2f|_{q}$ (the second derivative at a point $q \in V$) as a bilinear map $\varphi(D^2f|_{q}) \colon \mathbb{R}^n \times \mathbb{R}^n \rightarrow \mathbb{R}$ which is usually also denoted by $D^2f$ (making the identification above invisible) and is simply a bilinear form on $\mathbb{R}^n$. Any bilinear form is completely determined by the matrix representing it with respect to some basis so let us consider the matrix $A_{ij} = \varphi(D^2f|_{q})(e_i, e_j)$ where $(e_i)$ is the standard basis of $\mathbb{R}^n$. I claim that $A = \operatorname{Hess}(f)|_{q}$. To verify it, we unravel all the relevant definitions and properties of the derivative:
$$ A_{ij} = \varphi(D^2f|_{q})(e_i,e_j) = ((D(Df)|_{q})(e_i))(e_j) = \left( \lim_{t \to 0} \frac{Df|_{q + te_i} - Df|_{q}}{t} \right)(e_j) = \lim_{t \to 0} \frac{Df|_{q + te_i}(e_j) - Df|_{q}(e_j)}{t} = \lim_{t \to 0} \frac{\frac{\partial f}{\partial x^j}(q + te_i) - \frac{\partial f}{\partial x^j}(q)}{t} = \frac{\partial^2 f}{\partial x^i x^j}(q).$$
More generally, you should think of $(D^k f)|_{q}(v_1, \dots, v_k)$ as taking first the directional derivative of $f$ with respect to the direction $v_1$, then taking the directional derivative of the result with respect to $v_2$, etc and finally evaluating the result at the point $q$.
Regarding the theorem you quote, let me demonstrate it in the case $k = 2$. Thus, we consider a function $f \colon \mathbb{R}^n \times \mathbb{R}^n \rightarrow \mathbb{R}^p$ which is bilinear and want to understand the derivative. For example, if $n = 2$ and $p = 1$ we can consider
$$ f((x,y),(u,v)) = 2xu + 4 xv + 5yu + 6yv. $$
The derivative should be a map $Df \colon \mathbb{R}^n \times \mathbb{R}^n \rightarrow \operatorname{Lin}(\mathbb{R}^n \times \mathbb{R}^n, \mathbb{R}^p)$ and we have
$$ Df|_{(q_1,q_2)}(v_1,v_2) = f(q_1,v_2) + f(v_1,q_2). $$
How does this work for our function $f$? For example,
$$ Df|_{(x_0,y_0),(u_0,v_0)}((1,0),(0,0)) = \frac{\partial f}{\partial x}\big|_{(x_0,y_0),(u_0,v_0)} = (2u)|_{(x_0,y_0),(u_0,v_0)} = 2u_0 \\ = f((x_0,y_0),(0,0)) + f((1,0),(u_0,v_0)).$$