Some comments first. There are several serious confusions in what you write. For example, in the third paragraph, having seen that the entries of $AB$ are obtained by taking the dot product of the corresponding row of $A$ with column of $B$, you write that you view $AB$ as a dot product of rows of $B$ and rows of $A$. It's not.
For another example, you talk about matrix multiplication "happening". Matrices aren't running wild in the hidden jungles of the Amazon, where things "happen" without human beings. Matrix multiplication is defined a certain way, and then the definition is why matrix multiplication is done the way it is done. You may very well ask why matrix multiplication is defined the way it is defined, and whether there are other ways of defining a "multiplication" on matrices (yes, there are; read further), but that's a completely separate question. "Why does matrix multiplication happen the way it does?" is pretty incoherent on its face.
Another example of confusion is that not every matrix corresponds to a "change in reference system". This is only true, viewed from the correct angle, for invertible matrices.
Standard matrix multiplication. Matrix multiplication is defined the way it is because it corresponds to composition of linear transformations. Though this is valid in extremely great generality, let's focus on linear transformations $T\colon \mathbb{R}^n\to\mathbb{R}^m$. Since linear transformations satisfy $T(\alpha\mathbf{x}+\beta\mathbf{y}) = \alpha T(\mathbf{x})+\beta T(\mathbf{y})$, if you know the value of $T$ at each of $\mathbf{e}_1,\ldots,\mathbf{e}_n$, where $\mathbf{e}^n_i$ is the (column) $n$-vector that has $0$s in each coordinate except the $i$th coordinate where it has a $1$, then you know the value of $T$ at every single vector of $\mathbb{R}^n$.
So in order to describe the value of $T$, I just need to tell you what $T(\mathbf{e}_i)$ is. For example, we can take
$$T(\mathbf{e}_i) = \left(\begin{array}{c}a_{1i}\\a_{2i}\\ \vdots\\ a_{mi}\end{array}\right).$$
Then, since
$$\left(\begin{array}{c}k_1\\k_2\\ \vdots\\k_n\end{array}\right) = k_1\mathbf{e}_1 + \cdots +k_n\mathbf{e}_n,$$ we have
$$T\left(\begin{array}{c}k_1\\k_2\\ \vdots\\ k_n\end{array}\right) = k_1T(\mathbf{e}_1) + \cdots +k_nT(\mathbf{e}_n) = k_1\left(\begin{array}{c}a_{11}\\a_{21}\\ \vdots\\a_{m1}\end{array}\right) + \cdots + k_n\left(\begin{array}{c}a_{1n}\\a_{2n}\\ \vdots\\ a_{mn}\end{array}\right).$$
It is very fruitful, then to keep track of the $a_{ij}$ in some way, and given the expression above, we keep track of them in a matrix, which is just a rectangular array of real numbers. We then think of $T$ as being "given" by the matrix
$$\left(\begin{array}{cccc}
a_{11} & a_{12} & \cdots & a_{1n}\\
a_{21} & a_{22} & \cdots & a_{2n}\\
\vdots & \vdots & \ddots & \vdots\\
a_{m1} & a_{m2} & \cdots & a_{mn}
\end{array}\right).$$
If we want to keep track of $T$ this way, then for an arbitrary vector $\mathbf{x} = (x_1,\ldots,x_n)^t$ (the ${}^t$ means "transpose"; turn every rown into a column, every column into a row), then we have that $T(\mathbf{x})$ corresponds to:
$$\left(\begin{array}{cccc}
a_{11} & a_{12} & \cdots & a_{1n}\\
a_{21} & a_{22} & \cdots & a_{2n}\\
\vdots & \vdots & \ddots & \vdots\\
a_{m1} & a_{m2} & \cdots & a_{mn}
\end{array}\right) \left(\begin{array}{c}
x_1\\x_2\\ \vdots\\ x_n\end{array}\right) = \left(\begin{array}{c}
a_{11}x_1 + a_{12}x_2 + \cdots + a_{1n}x_n\\
a_{21}x_1 + a_{22}x_2 + \cdots + a_{2n}x_n\\
\vdots\\
a_{m1}x_1 + a_{m2}x_2 + \cdots + a_{mn}x_n
\end{array}\right).$$
What happens when we have two linear transformations, $T\colon \mathbb{R}^n\to\mathbb{R}^m$ and $S\colon\mathbb{R}^p\to\mathbb{R}^n$? If $T$ corresponds as above to a certain $m\times n$ matrix, then $S$ will likewise correspond to a certain $n\times p$ matrix, say
$$\left(\begin{array}{cccc}
b_{11} & b_{12} & \cdots & b_{1p}\\
b_{21} & b_{22} & \cdots & b_{2p}\\
\vdots & \vdots & \ddots & \vdots\\
b_{n1} & b_{n2} & \cdots & b_{np}
\end{array}\right).$$
What is $T\circ S$? First, it is a linear transformation because composition of linear transformations yields a linear transformation. Second, it goes from $\mathbb{R}^p$ to $\mathbb{R}^m$, so it should correspond to an $m\times p$ matrix. Which matrix? If we let $\mathbf{f}_1,\ldots,\mathbf{f}_p$ be the (column) $p$-vectors given by letting $\mathbf{f}_j$ have $0$s everywhere and a $1$ in the $j$th entry, then the matrix above tells us that
$$S(\mathbf{f}_j) = \left(\begin{array}{c}b_{1j}\\b_{2j}\\ \vdots \\b_{nj}\end{array}\right) = b_{1j}\mathbf{e}_1+\cdots + b_{nj}\mathbf{e}_n.$$
So, what is $T\circ S(\mathbf{f}_j)$? This is what goes in the $j$th column of the matrix that corresponds to $T\circ S$. Evaluating, we have:
\begin{align*}
T\circ S(\mathbf{f}_j) &= T\Bigl( S(\mathbf{f}_j)\Bigr)\\\
&= T\Bigl( b_{1j}\mathbf{e}_1 + \cdots + b_{nj}\mathbf{e}_n\Bigr)\\\
&= b_{1j} T(\mathbf{e}_1) + \cdots + b_{nj}T(\mathbf{e}_n)\\\
&= b_{1j}\left(\begin{array}{c}
a_{11}\\\ a_{21}\\\ \vdots\\\ a_{m1}\end{array}\right) + \cdots + b_{nj}\left(\begin{array}{c} a_{1n}\\a_{2n}\\\ \vdots\\\ a_{mn}\end{array}\right)\\\
&= \left(\begin{array}{c}
a_{11}b_{1j} + a_{12}b_{2j} + \cdots + a_{1n}b_{nj}\\\
a_{21}b_{1j} + a_{22}b_{2j} + \cdots + a_{2n}b_{nj}\\\
\vdots\\\
a_{m1}b_{1j} + a_{m2}b_{2j} + \cdots + a_{mn}b_{nj}
\end{array}\right).
\end{align*}
So if we want to write down the matrix that corresponds to $T\circ S$, then the $(i,j)$th entry will be
$$a_{i1}b_{1j} + a_{i2}b_{2j} + \cdots + a_{in}b_{nj}.$$
So we define the "composition" or product of the matrix of $T$ with the matrix of $S$ to be precisely the matrix of $T\circ S$. We can make this definition without reference to the linear transformations that gave it birth: if the matrix of $T$ is $m\times n$ with entries $a_{ij}$ (let's call it $A$); and the matrix of $S$ is $n\times p$ with entries $b_{rs}$ (let's call it $B$), then the matrix of $T\circ S$ (let's call it $A\circ B$ or $AB$) is $m\times p$ and with entries $c_{k\ell}$, where
$$c_{k\ell} = a_{k1}b_{1\ell} + a_{k2}b_{2\ell} + \cdots + a_{kn}b_{n\ell}$$
by definition. Why? Because then the matrix of the composition of two functions is precisely the product of the matrices of the two functions. We can work with the matrices directly without having to think about the functions.
In point of fact, there is nothing about the dot product which is at play in this definition. It is essentially by happenstance that the $(i,j)$ entry can be obtained as a dot product of something. In fact, the $(i,j)$th entry is obtained as the matrix product of the $1\times n$ matrix consisting of the $i$th row of $A$, with the $n\times 1$ matrix consisting of the $j$th column of $B$. Only if you transpose this column can you try to interpret this as a dot product. (In fact, the modern view is the other way around: we define the dot product of two vectors as a special case of a more general inner product, called the Frobenius inner product, which is defined in terms of matrix multiplication, $\langle\mathbf{x},\mathbf{y}\rangle =\mathrm{trace}(\overline{\mathbf{y}^t}\mathbf{x})$).
And because product of matrices corresponds to composition of linear transformations, all the nice properties that composition of linear functions has will automatically also be true for product of matrices, because products of matrices is nothing more than a book-keeping device for keeping track of the composition of linear transformations. So $(AB)C = A(BC)$, because composition of functions is associative. $A(B+C) = AB + AC$ because composition of linear transformations distributes over sums of linear transformations (sums of matrices are defined entry-by-entry because that agrees precisely with the sum of linear transformations). $A(\alpha B) = \alpha(AB) = (\alpha A)B$, because composition of linear transformations behaves that way with scalar multiplication (products of matrices by scalar are defined the way they are precisely so that they will correspond to the operation with linear transformations).
So we define product of matrices explicitly so that it will match up composition of linear transformations. There really is no deeper hidden reason. It seems a bit incongruous, perhaps, that such a simple reason results in such a complicated formula, but such is life.
Another reason why it is somewhat misguided to try to understand matrix product in terms of dot product is that the matrix product keeps track of all the information lying around about the two compositions, but the dot product loses a lot of information about the two vectors in question. Knowing that $\mathbf{x}\cdot\mathbf{y}=0$ only tells you that $\mathbf{x}$ and $\mathbf{y}$ are perpendicular, it doesn't really tell you anything else. There is a lot of informational loss in the dot product, and trying to explain matrix product in terms of the dot product requires that we "recover" all of this lost information in some way. In practice, it means keeping track of all the original information, which makes trying to shoehorn the dot product into the explanation unnecessary, because you will already have all the information to get the product directly.
Examples that are not just "changes in reference system". Note that any linear transformation corresponds to a matrix. But the only linear transformations that can be thought of as "changes in perspective" are the linear transformations that map $\mathbb{R}^n$ to itself, and which are one-to-one and onto. There are lots of linear transformations that aren't like that. For example, the linear transformation $T$ from $\mathbb{R}^3$ to $\mathbb{R}^2$ defined by
$$T\left(\begin{array}{c}
a\\b\\c\end{array}\right) = \left(\begin{array}{c}b\\2c\end{array}\right)$$
is not a "change in reference system" (because lots of nonzero vectors go to zero, but there is no way to just "change your perspective" and start seeing a nonzero vector as zero) but is a linear transformation nonetheless. The corresponding matrix is $2\times 3$, and is
$$\left(\begin{array}{cc}
0 & 1 & 0\\
0 & 0 & 2
\end{array}\right).$$
Now consider the linear transformation $U\colon\mathbb{R}^2\to\mathbb{R}^2$ given by
$$U\left(\begin{array}{c}x\\y\end{array}\right) = \left(\begin{array}{c}3x+2y\\
9x + 6y\end{array}\right).$$
Again, this is not a "change in perspective", because the vector $\binom{2}{-3}$ is mapped to $\binom{0}{0}$. It has a matrix, $2\times 2$, which is
$$\left(\begin{array}{cc}
3 & 2\\
9 & 6
\end{array}\right).$$
So the composition $U\circ T$ has matrix:
$$\left(\begin{array}{cc}
3 & 2\\
9 & 6
\end{array}\right) \left(\begin{array}{ccc}
0 & 1 & 0\\
0 & 0 & 2
\end{array}\right) = \left(\begin{array}{ccc}
0 & 3 & 4\\
0 & 9 & 12
\end{array}\right),$$
which tells me that
$$U\circ T\left(\begin{array}{c}x\\y\\z\end{array}\right) = \left(\begin{array}{c} 3y + 4z\\ 9y+12z\end{array}\right).$$
Other matrix products. Are there other ways to define the product of two matrices? Sure. There's the Hadamard product, which is the "obvious" thing to try: you can multiply two matrices of the same size (and only of the same size), and you do it entry by entry, just the same way that you add two matrices. This has some nice properties, but it has nothing to do with linear transformations. There's the Kronecker product, which takes an $m\times n$ matrix times a $p\times q$ matrix and gives an $mp\times nq$ matrix. This one is associated to the tensor product of linear transformations. They are defined differently because they are meant to model other operations that one does with matrices or vectors.
In an affine space we can "forget" about the origin, in the sense that it is determined by arbitrary choice of coordinates and so isn't a distinguished part of the space itself. This space has points, and between points we can draw arrows to describe direction. These "arrows" are vectors, and the set of all vectors forms a vector space: an algebraic structure where addition makese sense and scalar multiplication by elements from a given field (the real numbers here) too. There is a vector which is the additive inverse, zero. The vectors act on the points in the affine space by translating them from one location to another, according to the direction and magnitude of the vector. This should all be known already, but it is key that at first the affine space and vector space are two different things.
The vector space has an origin distinguished by being the additive identity, but we can take a copy of this vector space and then interpret the vectors as points, and then the arrows that exist between two points is the original vector that needed to be added algebraically to go from one to the other; we can keep the origin as part of a particular coordinate system. In this way we can view a space as both a vector space and an affine space simultaneously!
It gets a little tricky when we want to describe geometry though. Two vectors standing on an affine space are parallel if they point in the same direction, with no restrictions on their base point. On the other hand, if we want to view these parallel vectors in their vector space habitat as arrows they must be arrows pointing from the origin. The inner product is an operation on the vector space, so if we have two vectors in affine space we want to dot together we do have to "center" them in this way so that the angle-between-them interpretation remains valid.
We can translate vectors on the affine space (move them around without changing their direction) and they remain the same vector, just with a different base point. The operation of addition on the vector space however results in a new vector (when the summands are nonzero), and moreover adding two nonparallel vectors results in a vector that is not parallel with either of the original two.
What we can say instead is that if we have the zero vector $0$, a vector $v$, and a translation vector $w$, we can interpret $0$ and $v$ as points and the arrow between them will of course be the vector $v$, while if we translate the points $0$ and $v$ by the vector $w$ we will obtain the points $w$ and $v+w$ respectively (we must be careful about which we call vectors and which we call points!), The vector between these latter two points will again be $v$, which is obviously parallel to our original vector (because they are one and the same vector).
If $p$ is a vector we reinterpret as a point, and $v$ a vector in affine space with base point $p$, then the vector $v$ understood as an arrow will point specifically to the point $p+v$ (remember the addition takes place in the vector space, so to understand this we have to go back to the vector interpretation of $p$, add to $v$, and then forward again to the affine interpretation as a point). The point $p+v$ corre-sponds to the original vector $p+v$, so the "centering" process involves taking the point $p$ back to the origin (associated to the zero vector) as well as the point $p+v$ back to the point $v$, which is done by subtracting out the vector $p$. In other words, to center a vector existing in affine space, we take the point that it points to as an arrow, interpret it as a vector and subtract out the vector associated to the original base point. This is conceptually a rather roundabout process, but it's what goes on.
Moreover, there is nothing special about the vector $1_n:=(1,\cdots,1)$ when it comes to centering; it does shift every component by $1$ when added to a vector but generally this doesn't center anything at all. Translating a point in affine space just moves it in some specific direction, and indeed there is nothing inherently special about this direction; if we change our coordinate system the component form of this vector could be almost anything we want it to be.
What does it mean when the sum of the components of a vector is zero? (First, keep in mind this sum depends on the choice of coordinate system, so is not intrinsically a function of just the vector space. This is because what vector "$1_n$" specifies depends on coordinates.) It means the dot product between $v$ and $1_n$ is zero, so they are orthogonal aka perpendicular. Thinking of matrices as linear transformations of a vector space (given coordinates) then allows us to use this information to characterize the matrices (with eigenvectors' entries summing to zero) in a geometric way.
Best Answer
I think learning a bit of Abstract Algebra (here's an introduction) will answer some of these questions. An operator (addition/multiplication/etc) and type of objects (integers/reals/vectors/etc) it acts on - come together. You need to define what a multiplication is for this type of objects. For instance:
You can come up with any mathematical operation which does craziest things. But most of such operations will be useless. E.g. they won't help finding unknown $x$ in the equation: $a*x=b$ where $*$ is your new mathematical operation, and $a, x, b$ are your new type of objects (I don't know.. let's say they are circles - you multiply circles). While existing, real maths creates these structures & operations in a way to be useful for some purpose.
Dot product is not a multiplication in the typical sense. E.g. you can't solve $a\cdot x=b$. This operation has a completely different purpose - it's useful for different reasons. E.g. it can be used:
That's why Dot Products are useful - not because they can e.g. solve equations. Incidentally all this math can be generalized. Notice that e.g. functions like polynomials also have similar properties like geometrical vectors: you can multiply them by a scalar, you can add them. But then - what would be a length of such "vector"? What is a dot product? To answer these questions a list of requirements are introduced for such products taken from what we just saw like length of a unit vector $\frac{a}{|a|}$ is 1, and others. So if we could come up with a new operation that obeys these properties - we can use it the way we used dot products.
Such products are now called Inner Products and Dot Product is just one of the implementations. For functions for instance you could define a different implementation like $\int{f(x)f(x)dx}$ is going to be similar to $a \cdot a$. It's possible to define all sorts of implementations for Inner Product, but they still have to obey those roles. And now we can define what length is and it means to be perpendicular for functions and other entities that obey similar rules. And we can re-use all the maths from vectors with these entities - when used in such a way these entities and their respective operations are called Abstract Vectors.
Now units is about giving physical sense to numbers. Can you multiply oranges? Mathematically - the requirement is that you have right entities (e.g. integers) and appropriate operation (e.g. multiplication). But whether this has physical sense - that's not up to math to decide. If $orange^2$ (area of a field of oranges) makes sense - then go ahead and do the operation.
If you use oranges as one of the components of your vector e.g. then $|o|=\sqrt{o \cdot o}$ is still a length of a vector, so it gives you number of oranges in it (which you knew from the vector's component already).