In order:
Yes, there is meaning for basis and dimension for $L(V,W)$, and they have meaning for every other vector space for that matter.
The dimension is $\dim(V)*\dim(W)$.
Yes, you can show that $L(V,W)\cong M_{\dim(V),\dim(W)}(\Bbb F)$ in such a way that evaluation of these transformations corresponds to matrix multiplication.
If you have a linear transform $L : X \rightarrow Y$, where $X$ and $Y$ are finite dimensional linear spaces, then you choose a basis $\{ x_{i} \}_{i=1}^{n}$ of $X$ and a basis $\{ y_{j} \}_{j=1}^{m}$ of $Y$, and write
$$
Lx_{n} = \alpha_{1,n}y_{1}+\alpha_{2,n}y_{2}+\cdots+\alpha_{m,n}y_{m}.
$$
The constants $\alpha_{n,m}$ are unique. Every $x \in X$ can be written uniquely as
$$
x = \beta_1 x_1 + \beta_2 x_2 + \cdots + \beta_n x_n.
$$
By linearity
$$
\begin{align}
Lx & = \beta_1 Lx_1 + \beta_2 Lx_2 + \cdots \beta_n Lx_n \\ \\
& = \beta_1 (\alpha_{1,1} y_1 + \alpha_{2,1}y_2 + \cdots + \alpha_{m,1}y_m) \\
& + \beta_2 (\alpha_{1,2} y_1 + \alpha_{2,2}y_2 + \cdots + \alpha_{m,2}y_m) \\
& + \cdots + \\
& + \beta_n (\alpha_{1,n} y_1 + \alpha_{2,n}y_2 + \cdots + \alpha_{m,n}y_m) \\ \\
& = (\alpha_{1,1}\beta_1+\alpha_{1,2}\beta_2+\cdots+\alpha_{1,n}\beta_{n})y_1 \\
& + (\alpha_{2,1}\beta_1+\alpha_{2,2}\beta_2+\cdots+\alpha_{2,n}\beta_{n})y_2 \\
& + \cdots + \\
& + (\alpha_{m,1}\beta_1+\alpha_{m,2}\beta_2+\cdots+\alpha_{m,n}\beta_{n})y_n
\end{align}
$$
So, the action of $L$ is uniquely determined by the matrix $[\alpha_{i,j}]$ as follows: Start with $x \in X$, write $x = \sum_{i=1}^{n}\beta_{i}x_{i}$, then perform matrix multiply $[\alpha_{j,i}][\beta_{i}]$ with gives $[\gamma_{j}]$, and you then reconstruct $Lx = \gamma_1 y_1+\gamma_2 y_2 + \cdots \gamma_m y_m$. Therefore, $L$ is completely determined by the $n\times m$ matrix $[\alpha_{i,j}]$ as defined above. Conversely, every such matrix determines a linear $L$ whose matrix representation is the given matrix.
Best Answer
The theorem says "functions," not "linear transformations," so in fact the theorem Apostol proves answers your question: yes, function composition is associative, no matter whether the functions involved are linear. You are right to see that the proof doesn't depend in any way on linearity -- because, indeed, linearity is not assumed, nor is it needed.