This is too long to be a comment, so I make it an answer instead. Not a solution, just remarks.
I think that you are on the right track in the sense that this approach will lead to a proof. The observation that the other matrices $B$ need to keep the subspaces $W_i,i=1,2,3$ invariant is the key, and you can do the rest case by case. However, if I were grading your exam I would want you to justify the following claim that you made:
Now the dimension of the family of commuting operators on a 2-dimensional vectors space over $C$ is at most 2. Indeed each of the commuting operators on $W_2$ will have a repeated eigenvalue.
Ok. Probably this was meant to be $W_3$, i.e. the 2-dimensional characteristic space of $A$. Your first claim is correct in that a commuting family of operators on $W_3$ has dimension at most $2$, but you haven't proven it yet! It may be a lemma or an example in your textbook, in which case you are ok. But to be on the safe side, I want you to consider the case that $A$ acts as identity operator (or zero - any scalar will do!) on the subspace on $W_3$. Then $A$ will commute with all the operators on $W_2$, and those form a 4-dimensional space. Granted, they don't commute with each other, and in the end the maximum dimension will be 2, but you haven't shown it yet.
My suspicion was raised by your second claim that each operator on $W_2$ will have a repeated eigenvalue. This is most certainly false! For example, the space $F$ could consists of the diagonal matrices, but you just happened to pick as $A$ the diagonal matrix with entries $1,2,3,3$. Then $W_3$ consists of the span of the last two basis vectors, but the diagonal matrix $B$ with entries $2,3,5,8$ does not have a repeated characteristic value on $W_3$.
Then to a hint: There is no way to guarantee that you could pick $A$ in the right way, so another matrix $B$ may have several eigenvalues on a characteristic space of $A$. But, then the relation $AB=BA$ means that $A$ has to keep those eigenspaces of $B$ in $W_3$ invariant! In other words $A$ has to be a scalar on $W_2$, whenever such a $B$ exists in $F$. But we can use $B$ to split the space $W_3$ further into a direct sum of smaller components. By the preceding arguments, all the matrices in $F$ then need to respect this finer direct sum decomposition, and you are back to the case of simultaneously diagonalizable set.
[Edit: Clarifying the hint] So I would begin by showing that the 4-d space can be split into a direct sum of subspaces $\oplus\sum_iW_i$ in such a way that every element of $F$ keeps each summand invariant, and has only a single characteristic value on any component.]
But another trick is needed, when $A$ is not diagonalizable, and (consequently) no $B\in F$ has several characteristic values in a characteristic space of $A$. Hint: Any $B\in F$ must map the null space of $(A-\lambda I)^k$ to itself. [Edit: In view of the general result cited by Pierre-Yves this requires careful work in the case that the multiplicity of the characteristic value is three, as my hint only shows that all the matrices of $F$ are upper triangular w.r.t. to the basis that puts $A$ into its Jordan form. That set of upper triangular 3x3 matrices with identical entries along the diagonal forms a 4-d space. But is not commutative!]
In order:
Yes, there is meaning for basis and dimension for $L(V,W)$, and they have meaning for every other vector space for that matter.
The dimension is $\dim(V)*\dim(W)$.
Yes, you can show that $L(V,W)\cong M_{\dim(V),\dim(W)}(\Bbb F)$ in such a way that evaluation of these transformations corresponds to matrix multiplication.
Best Answer
The space $\mathscr{L}(V)$ is the space of linear operators, meaning the set of linear functions from $V$ to $V$. You can take powers of them (or indeed multiply them generally) by composition; the result still maps from $V$ to $V$. If you were to represent these linear operators as matrices, they would all be square.
They form a vector space in their own right, under addition and scalar multiplication pointwise. This takes some proving, but is very easy to do.
The minimal polynomial of an arbitrary element of $\mathscr{L}(V)$ doesn't tell you the dimension of $\mathscr{L}(V)$. Instead, one can fix a basis $$B = (v_1, \ldots, v_n)$$ of $V$, and for $i, j$ between $1$ and $n$, define $T_{ij}$ to map $v_i$ to $v_j$, and map each other vector in $B$ to $0$. In terms of matrices, these correspond to the matrices with the entry in the $i$th column and the $j$th row being $1$, and everywhere else being $0$. It's not too difficult to show that these $n^2$ maps are linearly independent and spanning, and hence a basis.
However, the fact that $\operatorname{dim} \mathscr{L}(V) = n^2$ is useful in proving the existence of a minimal polynomial. It means that a list of $n^2 + 1$ vectors from $\mathscr{L}(V)$ is necessarily linearly dependent. So, if we take such a list $$(I, L, L^2, \ldots, L^{n^2}),$$ then there is necessarily a non-trivial linear combination of these vectors that produces the $0$ vector. That is, there exist $a_0, a_1, \ldots, a_{n^2}$, not all equal to $0$, such that $$a_0 I + a_1 L + a_2 L^2 + \ldots + a_{n^2} L^{n^2} = 0.$$ These are equal as operators, i.e. equality holds at each point. So, we have found a polynomial $$p(z) = a_0 + a_1 z + a_2 z^2 + \ldots + a_{n^2} z^{n^2}$$ such that $p(L) = 0$. One can then argue that, if one such polynomial must exist, there must be a unique monic polynomial of minimal degree with the same property (and indeed divides all other polynomials with this property). Thus, the minimal polynomial has degree at most $n^2$.
Of course, you can get a tighter bound with the Cayley-Hamilton theorem!