Background: I work in the field of numerical relativity. I've read Carroll's book, but not recently.
It's pretty common for physics students to reach this point in their education, not really knowing anything about what tensors are or how they're talked about in higher mathematics. That's not really the students' fault. If your education was anything like mine, your first exposure to this stuff probably came from an electromagnetism course, or maybe a classical mechanics course. You stuck with vector calculus, and maybe the odd matrix now and then to do transformations, and that's all you needed.
Let's start with matrices, though: you might've thought of matrices as arrays of numbers, just with some funny "matrix multiplication" operation that lets you multiply matrices and vectors to get other vectors. That's good enough to do the computation, but it's a very narrow way of looking at things.
Instead, think of the matrix abstractly as corresponding to a vector-valued linear function of a vector. It's a vector field! Right? A vector-valued function of a vector is, according to everything you've been taught, a vector field. The only additional property we're imposing is that this function be linear.
Example: consider the matrix
$$T = \begin{bmatrix} a & b & c \\ d & e & f \\ g & h & i \end{bmatrix}$$
You can write $T$ like a function. Given a vector $\vec v$ and a basis $\vec u_1, \vec u_2, \vec u_3$, you could write
$$\begin{align*} T(\vec v) &= [(a \vec u_1 + b \vec u_2 + c \vec u_3) \cdot \vec v ] \vec u_1 \\ & + [(d \vec u_1 + e \vec u_2 + f \vec u_3) \cdot \vec v ] \vec u_2 \\ & + [(g \vec u_1 + h \vec u_2 + i \vec u_3) \cdot \vec v ] \vec u_3\end{align*}$$
Each of those dot products is just doing the row-column approach to matrix multiplication that you already know. This expression, for a general matrix, is rather tedious and tiresome, but most geometric transformations can be written more compactly.
So, a matrix isn't just an array of numbers with some arcane multiplication rule attached. It corresponds to a linear function--a linear map, as mathematicians would say. You can see in the above example that the components of the matrix correspond with the basis we used to write out the function $T$. If you change basis, you change components. That much becomes obvious when written this way.
General tensors correspond to maps just as matrices do. Here, we showed a matrix can correspond to a map from a vector to a vector. A tensor could map a vector to another vector, or a vector to a covector, or several vectors to a scalar, for instance.
On component transformation laws: physicists usually have the point of view that a change of basis doesn't change the underlying vector being described; it merely changes the basis used to describe that vector. The change of basis means you have different vector components, but the vector itself hasn't changed. When you think of a tensor as a map---as some linear function--you ought to be able to describe the arguments in any basis you like. This changes the components of the tensor as described in that basis, but not the tensor itself.
Now, even this answer is only just the tip of the iceberg. I would definitely criticize physicists for not presenting tensors as linear functions; if they had put more emphasis on this, the transformation laws would be obvious from the chain rule and hardly need comment.
However, I think a physicist should not be so eager to treat geometric objects (like vectors and such) as general tensors. You can do this, but doing so deprives you of the geometric intuition you have probably built up. Instead, geometric objects like tangents directions to curves, tangent planes to surface, and the like, should be thought of as elements of an exterior (or clifford) algebra instead. These formalisms let you ignore the "map" definition of vectors and such, so you can focus on building planes, volumes, and the like.
For calculus at this level, it seems the mathematician's preferred tool of choice is differential forms. A physicist might find forms inelegantly integrated into Carroll's text alongside the vanilla, index-manipulation sludge of plain old tensor calculus. Do yourself a favor: at the least, learn forms. It makes all the calculus here as easy as electromagnetism's vector calculus was. I have issues with some of the conventions that forms people tend to use--for reasons totally irrelevant to general relativity, forms people prefer to do everything in terms of forms, and not in terms of actual $k$-vector fields, which is an arbitrary choice, but it leads to circuitous garbage like defining inner products in terms of the Hodge star, which is backwards as sin--but it's still a big improvement over index manipulation.
If $V$ and $W$ are vector spaces, you can form a third vector space from them called their tensor product $V \otimes W$. The tensor product consists of sums of certain vectors called "pure tensors," which are written $v \otimes w$ where $v \in V, w \in W$, subject to certain rules, e.g. $(v_1 + v_2) \otimes w = v_1 \otimes w + v_2 \otimes w$. For a complete list of these rules see Wikipedia. In practice you'll do fine if you remember the following:
If $v_1, \dots v_n$ is a basis of $V$ and $w_1, \dots w_m$ is a basis of $W$, then the pure tensors $v_i \otimes w_j, 1 \le i \le n, 1 \le j \le m$ form a basis of $V \otimes W$. In particular, $\dim V \otimes W = \dim V \times \dim W$.
If $T : V_1 \to V_2$ and $S : W_1 \to W_2$ are two linear maps, you can form a third linear map from them which is also called their tensor product
$$T \otimes S : V_1 \otimes W_1 \to V_2 \otimes W_2.$$
It is completely determined by how it behaves on pure tensors, which is
$$(T \otimes S)(v \otimes w) = T(v) \otimes S(w).$$
The relationship between these two uses of the term "tensor product" is given formally by the notion of a functor.
Tensor product notation for linear maps is compatible with the notation $v \otimes w$ for pure tensors in the following sense. A vector $v \in V$ in a vector space is the same thing as a linear map $v : 1 \to V$ from the one-dimensional vector space $1$ given by the underlying field to $V$, and if $v : 1 \to V$ and $w : 1 \to W$ are two vectors in $V, W$, then their tensor product as linear maps $v \otimes w : 1 \otimes 1 \to V \otimes W$ corresponds to the pure tensor $v \otimes w$, where we use that there's a canonical isomorphism $1 \otimes 1 \cong 1$.
The Kronecker product is a description of the tensor product of linear maps with respect to a choice of basis for all of the vector spaces involved. Formally, with notation as above, if
- $B_1, B_2$ are bases for $V_1, V_2$,
- $C_1, C_2$ are bases for $W_1, W_2$,
- given bases $B_i, C_i$ of $V_i, W_i$, we write $B_i \otimes C_i$ for the corresponding basis of $V_i \otimes W_i$ as in the highlighted area above, and
- we write $_{B_2}[T]_{B_1}$ to refer to the matrix of a linear transformation $T : V_1 \to V_2$ with respect to a basis $B_1$ of $V_1$ and a basis $B_2$ of $V_2$,
then we have
$$_{B_2 \otimes C_2}[T \otimes S]_{B_1 \otimes C_1} = \, _{B_2}[T]_{B_1} \otimes \, _{C_2}[S]_{C_1}$$
where on the LHS $\otimes$ means the tensor product of linear maps and on the RHS $\otimes$ means the Kronecker product.
One final remark: the definition of spaces of tensors you give in 2) is a terrible definition that I've only seen in some textbooks on differential geometry. It is absolutely the wrong way to think about tensors.
Best Answer
Let's first look at a very special type of tensor, namely the $(0,1)$ tensor. What is it? Well, it is the tensor product of $0$ copies of members of $V$ and one copy of members of $V^*$. That is, it is a member of $V^*$.
But what is a member of $V^*$? Well, by the very definition of $V^*$ is is a linear function $\phi:V\to K$. Let's write this explicitly: $$T^0_1V = V^* = \{\phi:V\to K\mid\phi \text{ is linear}\}$$ You see, already at this point, where we didn't even use a tensor product, we get a $V^*$ on one side, and a $V$ on the other, simply by inserting the definition of $V^*$.
From this, it is obvious why $(0,q)$-tensors have $q$ copies of $V^*$ in the tensor product $(2)$, but $q$ copies of $V$ in the domain of the multilinear function in $(3)$.
OK, but why do you have a $V^*$ in the map in $(3)$ for each factor $V$ in the tensor product? After all, vectors are not functions, are they?
Well, in some sense they are: There is a natural linear map from $V$ to its double dual $V^{**}$, that is, the set of linear functions from $V^*$ to $K$. Indeed, for finite dimensional vector spaces, you even have that $V^{**} \cong V$. This natural map is defined by the condition that applying the image of $v$ to $\phi\in V^*$ gives the same value as applying $\phi$ to $v$. I suspect that the lecture assumes finite dimensional vector spaces. In that case, you can identify $V$ with $V^{**}$, and therefore you get $$T^1_0V = V = V^{**} = \{T:V^*\to K\mid T \text{ is linear}\}$$ Here the second equality is exactly that identification.
Now again it should be obvious why $p$ copies of $V$ in the tensor product $(2)$ give $p$ factors of $V^*$ for the domain of the multilinear functions in $(3)$.
Edit: On request in the comments, something about the relations of those terms to the Kronecker product.
The tensor product $\color{darkorange}{\otimes}$ in $(2)$ is a tensor product not of (co)vectors, but of (co)vector spaces. The result of that tensor product describes not one tensor, but the set of all tensors of a given type. The tensors are then elements of the corresponding set. And given a basis of $V$, the tensors can then be specified by giving their coefficients in that basis.
This is completely analogous to the vector space itself. We have the vector space, $V$, this vector space contains vectors $v\in V$, and given a basis $\{e_i\}$ of $V$, we can write the vector in components, $v = \sum_i v^i e_i$.
Similarly for $V^*$, we can write each member $\phi\in V^*$ in the dual basis $\omega^i$ (defined by $\omega^i(e_j)=\delta^i_j$) as $\sum_i \phi_i \omega^i$. An alternative way to get the components $\phi_i$ is to notice that $\phi(e_k) = \sum_i \phi_i \omega^i(e_k) = \sum_i \phi_i \delta^i_k = \phi_k$. That is, the components of the covector are just the function values at the basis vectors.
This way one also sees immediately that $\phi(v) = \sum_i \phi(v^i e_i) = \sum_i v^i\phi(e_i) = \sum_i v^i \phi_i$, which is sort of like an inner product, but not exactly, because it behaves differently at change of basis.
Now let's look at a $(0,2)$ tensor, that is, a bilinear function $f:V\times V\to K$. Note that $f\in V^*\color{darkorange}{\otimes} V^*$, as $V^*\color{darkorange}{\otimes} V^*$ is by definition the set of all such functions (see eq. $(3)$). Now by being a bilinear function, one again only needs to know the values at the basis vectors, as $$f(v,w) = f(\sum_i v^i e_i, \sum_j w^j e_j) = \sum_{i,j}v^i w^j f(e_i,e_j)$$ and therefore we can define as components $f_{ij} = f(e_i,e_j)$ and get $f(v,w)=\sum_{i,j}f_{ij}v^i w^j$.
This goes also for general tensors: A single tensor $T\in T^p_qV$ is a multilinear function $T:(V^*)^p\times V^q\to K$, and it is completely determined by the values you get when inserting basis vectors and basis covectors everywhere, giving the components $$T^{i\ldots j}_{k\ldots l}=T(\underbrace{\omega^i,\ldots,\omega^j}_{p},\underbrace{e_k,\ldots,e_l}_{q})$$
OK, we now have components, but we have still not defined the tensor product of tensors. But that is actually quite easy:
Be $x\in T^p_qV$, and $y\in T^r_sV$. That is, $x$ is a function that takes $p$ covectors and $q$ vectors, and gives a scalar, while $y$ takes $r$ covectors and $s$ vectors to a scalar. Then the tensor product $x\color{blue}{\otimes} y$ is a function that takes $p+r$ covectors and $q+s$ vectors, feeds the first $p$ covectors and the first $q$ vectors to $x$, and the remaining $r$ covectors and $s$ vectors to $y$, and them multiplies the result. That is, $$(x\color{blue}{\otimes} y)(\underbrace{\kappa,\ldots,\lambda,\mu,\ldots,\nu}_{p+r},\underbrace{u,\ldots,v,w,\ldots,x}_{q+s}) = x(\underbrace{\kappa,\ldots,\lambda}_p,\underbrace{u,\ldots,v}_q)\cdot y(\underbrace{\mu,\ldots,\nu}_{r},\underbrace{w,\ldots,x}_{s})$$ It is not hard to check that this function is indeed also multilinear, and therefore $x\color{blue}{\otimes} y\in T^{p+r}_{q+s}V$.
And now finally, we get to the question what the components of $x\color{blue}{\otimes} y$ are. Well, the components of $x\color{blue}{\otimes} y$ are just the function values when inserting basis vectors and basis covectors, and when you do that and use the definition of the tensor product, you find that indeed, the components of the tensor product are the Kronecker product of the components of the factors.
Also, it can be shown that $T^p_q V$ is a vector space in its own right, and therefore the $(p,q)$-tensors can be written as the linear combination of a basis that is $1$ exactly for one combination of basis vectors and basis covectors and $0$ for all other combinations. However it can then easily be seen that this is just the tensor product of the corresponding dual covectors/vectors. Since furthermore in that basis, the coefficients on the basis vectors are just the components of the tensor as introduced before, we finally arrive at the formula $$T = \sum T^{i\ldots j}_{k\ldots l}\underbrace{e_i\color{blue}{\otimes}\dots\color{blue}{\otimes} e_j}_{p}\color{blue}{\otimes}\underbrace{\omega^k\color{blue}{\otimes}\dots,\color{blue}{\otimes}\;\omega^l}_{q}$$