This answer is a response the the prompt
Any help or general information about the relationship between Schur-Weyl duality and symmetric functions you could provide would be greatly appreciated.
If you have more questions (e.g. about the free Lie algebra), feel free to ask.
For convenience I will let $n = \dim(V)$.
Step 1: A bitrace formula.
As discussed, we have an action of $GL(V) \times S_k$ on $V^{\otimes k}$. We will compute the trace of an element $(M, g) \in GL(V) \otimes S_k$ on $V^{\otimes k}$. Conceptually we will think about $GL(V)$ and $S_k$ as separate (rather than combined into $GL(V) \otimes S_k$) which is why I use the term "bitrace" (it is the synthesis of two traces).
We compute the trace directly. The matrix $M$ has a Jordan decomposition $M = S + N$ ($S$ semisimple, $N$ nilpotent), and the action of $(M,g)$ on $V^{\otimes k}$ is the sum of the actions of $(S,g)$ and $(N,g)$; since $N$ is nilpotent, $(N,g)$ is not actually an element of $GL(V)\times S_k$, nevertheless the trace is well defined and equal to zero because it is nilpotent. I say all this only to justify restricting to diagonalisable matrices.
Now, let $v_1, \ldots, v_n$ be an eigenbasis of $M$, so that $Mv_i = x_i v_i$ for some complex numbers $x_i$ (i.e. $M = diag(x_1,\ldots,x_n)$).
This induces a basis $V_I = v_{i_1} \otimes \cdots \otimes v_{i_k}$ of $V^{\otimes k}$ indexed by words $I = (i_1, \ldots, i_k)$ (where $1 \leq i_j \leq n$). Conveniently, the action of $(M,g)$ on $V_I$ is easy to compute:
$$
(M,g) \cdot (v_{i_1} \otimes \cdots \otimes v_{i_k}) = g \cdot (Mv_{i_1} \otimes \cdots \otimes Mv_{i_k}) = g \cdot (x_{i_1}v_{i_1} \otimes \cdots \otimes x_{i_k}v_{i_k}) \\ = x_{i_1}x_{i_2}\cdots x_{i_k} (v_{g^{-1}(1)} \otimes \cdots \otimes v_{g^{-1}(n)})
$$
Side note: whether you apply $g^{-1}$ or $g$ to the indices depends on whether you view the symmetric group as having a left or right action, it's not really important.
This value of $(M,g) \cdot v_I$ is a scalar multiple of another basis element, which we might write $v_{g(I)}$ by using the induced action of $g \in S_k$ on tuples of length $k$. To compute the trace of $(M,g)$ we need to sum the "diagonal entries", i.e. the scalars corresponding to those $I$ with $v_I = v_{g(I)}$. This computation becomes
$$
\sum_{g(I) = I} x_{i_1} \cdots x_{i_k}.
$$
Now, suppose for example that $(2,5,6)$ is a cycle of $g \in S_k$. Then the condition $g(I) = I$ implies that the equality of indices $i_2 = i_5 = i_6$, but the actual value could be anything in $1, \ldots, n$. This condition also implies that $x_{i_2} = x_{i_5} = x_{i_6}$. This same reasoning extends to cycles of all sizes. The only nonzero "diagonal terms" to be summed are those where all indices acted on by a cycle of $g$ are the same. The actual index associated to each cycle is arbitrary, we we need to sum over those. If the cycles of $g$ have sizes $\mu_1, \ldots, \mu_l$, the trace becomes
$$
\sum_{j_1=1}^{n} \cdots \sum_{j_l=1}^{n} x_{j_1}^{\mu_1} \cdots x_{j_l}^{\mu_l} = (\sum_{j_1=1}^{n} x_{j_1}^{\mu_1}) \cdots (\sum_{j_l=1}^{n} x_{j_l}^{\mu_l}) = p_{\mu_1}(x) \cdots p_{\mu_l}(x) = p_\mu(x)
$$
where I am now using the standard notation for power-sum symmetric functions.
Conclusion: if the eigenvalues of $M$ are $x_i$ and $g$ has cycle type $\mu$, then the bitrace of $(M,g)$ acting on $V^{\otimes k}$ is $p_\mu(x)$.
Step 2: Frobenius characteristic and Cauchy identity.
The Frobenius characteristic, $ch$, is an isomorphism between the Grothendieck group of class functions on $S_k$ and symmetric functions of degree $k$ (here we work over $\mathbb{C}$). It is convenient to define $ch(f)$ for all functions on $S_k$ (not just class functions) by saying that if $g^*$ is the indicator function of $g \in S_k$, then $ch(g^*) = \frac{1}{k!} p_\mu(y)$, where $\mu$ is the cycle type of $g$, and I write $y$ for the symmetric function variables in order to distinguish them from the discussion in the previous step. So for example, if $C_{\mu}^*$ is the indicator function of the conjugacy class $C_\mu$ of elements of cycle type $\mu$, then $ch(C_{\mu}^*) = \frac{|C_\mu|}{k!} p_\mu(y) = \frac{1}{z_\mu}p_{\mu}(y)$, where $z_\mu$ has its usual meaning.
Now if we fix $M \in GL(V)$, then the bitrace of $(M,g)$ may be viewed as a (class) function on $S_k$, and so we may apply the Frobenius characteristic. If we write $cyc(g)$ for the cycle type of $g$, the result of this calculation is
$$
ch(tr(M,g)) = \sum_{g \in S_k} p_{cyc(g)}(x) \frac{1}{k!} p_{cyc(g)}(y) = \sum_{\mu \vdash k} \frac{1}{z_\mu} p_\mu (x) p_\mu(y).
$$
Now, the famous Cauchy identity implies that we have
$$
ch(tr(M,g)) = \sum_{\mu \vdash k} \frac{1}{z_\mu} p_\mu (x) p_\mu(y) = \sum_{\lambda \vdash k} s_\lambda(x) s_\lambda(y),
$$
where $s_\lambda$ is the Schur function indexed by $\lambda$.
This may be viewed as a symmetric-function-theoretic formulation of Schur-Weyl duality for the following reasons. Suppose we know that the Frobenius characteristic of the Specht module $S^\lambda$ is the Schur function $s_\lambda(y)$, and the character of the Schur functor $\mathbb{S}^\lambda(V)$ is the Schur function $s_\lambda(x)$ (the meaning of which is discussed in the comments to the original post). Then we have found that the bitrace (i.e. $GL(V) \times S_k$-character) of $V^{\otimes k}$ agrees with that of $\bigoplus_{\lambda \vdash k} \mathbb{S}^\lambda(V) \otimes S^\lambda$. By semisimplicity, these must be isomorphic.
For example, we can recover the multiplicity of the Specht module $S^\lambda$ in $V^{\otimes l}$ by computing $\dim(\mathbb{S}^\lambda(V))$, which is nothing but the trace of the identity of $GL(V)$. But the identity element of $GL(V)$ (viewed as a matrix) has $1$ as an eigenvalue repeated $n$ times, so the dimension is the evaluation $s_\lambda(1,\ldots,1)$ (where there are $n$ $1$-s), as you mention in your post.
Best Answer
As noted by Iosif Pinelis, we can reformulate the problem in terms of classical set systems in the following way:
Let $\mathcal{H}$ be a set system on $\mathbb{R}^n$ with VC-dimension $d$. Fix $k\geq 1$ and define $\mathcal{H}_k$ to be the collection of all sets of the form
$$ \bigcup_{i=1}^{k_0}H_i\cap C_i $$ where $k_0\leq k$, each $H_i$ is in $\mathcal{H}$, and $(C_1,\ldots,C_{k_0})$ is the Voronoi partition of cells corresponding to some $p_1,\ldots,p_{k_0}$ in $\mathbb{R}^n$.
The proof relies on the following tools.
The Fact is a standard exercise, and the Lemma is Lemma 3.2.3 in Learnability and the Vapnik-Chervonenkis Dimension by Blumer, Ehrenfeucht, Haussler, and Warmuth. The proof also is sketched in my answer here.
Some additional remarks.
The bound is certainly not tight since there is quite a bit of inefficiency in the proof; especially since $\mathcal{H}''$ is much bigger than $\mathcal{H}_k$. In particular, the proof is not actually using the partition feature of the $(C_1,\ldots,C_k)$ sequences (this is only used in checking that the Theorem accurately represents the original question about functions).
In response to ABIM's comments, the bound cannot be improved to $kd$ in general. See Example 1 below, which also shows that the bound must depend on $n$.
If one does not refine the original sequences $(\hat{C}_1,\ldots,\hat{C}_k)$ into partitions, then the Theorem is still true, but it does not accurately capture the original question. In fact, in this case the VC-dimension of the set systems from the question can be infinite, even when one just filters over a single sequence. See Example 2 below.
Example 2 is not very satisfying since it is just exploiting the signal loss in the definition of VC-dimension for a class of real-valued functions from the cited reference (which I find a bit peculiar, but perhaps I'm not the right one to judge). There are different notions of VC-dimension for sets of functions which seem more likely to preserve finiteness in this situation (namely, not requiring the cells to partition and not even requiring the points to be pairwise distinct).
In the next example, I will let $\hat{\mathcal{H}}_k$ denote the set system of functions as defined in the original question, but using the non-partitioned sequences $(\hat{C}_1,\ldots,\hat{C}_k)$.
Note that $n\geq 2$ is necessary in the previous example since in dimension $1$ there are only $k-1$ intersection points in any given sequence (defined using pairwise distinct points), and hence the perturbation to the VC-dimension contributed by these points is finite. On the other hand, if one considers a more degenerate notion in which the $p_i$'s are not assumed to be pairwise distinct, then an example similar to the above can be built to obtain infinite VC-dimension (with a single sequence) even in dimension $1$.