To add a remark related to Jim Belk's answer and the OP's comments on that answer:
In many naturally occurring situations, including some of those where group theory is particularly useful, endomorphisms are automatically automorphisms.
For example, if $E/F$ is a finite extension of fields, any endomorphism of $E$ which is the identity on $F$ is automatically an automorphism of $E$.
As another example, if $C$ is a Riemann surface of genus at least $2$, then
any (nonconstant) endomorphism of $C$ is necessarily an automorphism.
Any endomorphism of a Euclidean space which preserves lengths is necessarily an automorphism.
Another point to bear in mind is that the groups that arise in practice in geometry are often Lie groups (i.e. have a compatible topological, even smooth manifold, structure). One can define a more general notion of Lie semigroup,
but if your Lie semigroup has an identity (so is a Lie monoid) and the semigroup
structure is non-degenerate in some n.h. of the identity, then Lie semigroup
will automatically be a Lie group (at least in a n.h. of the identity). A related remark: in the definition of a formal group, there is no need to include an
explicit axiom about the existence of inverses.
To make a point related to Qiaochu Yuan's answer: in some contexts semigroups
do appear naturally.
For example, studying the rings of endomorphisms of an object is a very common technique in lots of areas of mathematics. (E.g., just
to make a connection to my first point, for genus $1$ Riemann surfaces, there can be endomorphisms that aren't automorphisms, but then genus $1$ Riemann surfaces can also be naturally made into abelian groups --- so-called elliptic curves --- and there is a whole theory, the theory of complex multiplication, devoted to studying their endomorphisms rings.)
As another example, any ring of char. $p > 0$ has a Frobenius endomorphism, which is not an automorphism in general; but the semigroup of endomorphisms
that it generates is typically an important thing to consider in char. $p$ algebra and geometry. (Of course, this semigroup is just a quotient of $\mathbb N$.)
One thing to bear in mind is what you hope to achieve by considering the group/semigroup of automorphisms/endomorphisms.
A typical advantage of groups is that they admit a surprisingly rigid theory (e.g. semisimple Lie groups can be completely classified; finite simple groups can be completely classified), and so if you discover a group lurking in your particular mathematical context, it might be an already well-known object, or at least there might be a lot of known theory that you can apply to it to obtain greater insight into your particular situation.
Semigroups are much less rigid, and there is often correspondingly less that can be leveraged out of discovering a semigroup lurking in your particular context. But this is not always true; rings are certainly well-studied, and the appearance of a given ring in some context can often be leveraged to much advantage.
A dynamical system involving just one process can be thought of as an action of the semigroup $\mathbb N$. Here there is not that much to be obtained from
the general theory of semigroups, but this is a frequently studied context. (Just to give a perhaps non-standard example, the Frobenius endomorphism of a char. $p$ ring is such a dynamical system.)
But, in such contexts, precisely because general semigroup theory doesn't help much, the tools used will be different.
E.g. in topology, the Lefschetz fixed point theorem is a typical tool that is used to study an endomorphism of (i.e. discrete dynamical system on) a
topological space. Interestingly, the same formula is used to study the action of Frobenius in char. $p$ geometry (see the Weil conjectures). So even in contexts
such as action of the semigroup $\mathbb N$, there is some coherent philosophy that can be discerned --- it is just that it is supplied by topology rather than algebra, since the algebra doesn't have all that much to say.
I think the conclusion to be drawn is not to be too doctrinaire, and to be sensitive to the actual mathematical contexts in which and from which the various notions of group, semigroup, automorphism, and endomorphism arise and have arisen.
The core of studying matrices is to study linear transformations between vector spaces. These can be realized as matrix multiplication on the left (or right) of column (or row) vectors.
If we are in this setup: $x\mapsto Ax$ for a column vector $x$ and appropriate matrix $A$, then the image of the linear transformation will be spanned by the columns of $A$.
The kernel of the transformation (nullspace) is the set of all $x$ such that $Ax=0$ is important for understanding the solutions to some matrix equations. You probably have already learned that if $x_0$ is a solution to $Ax=b$, then every other solution is given by $x_0+k$ where $k$ is in the nullspace.
This all has analogous explanation on the other side. If we are in this setup: $x\mapsto xA$ for a row vector $x$, then the image of the linear transformation is now spanned by the rows of $A$.
Talking about the nullspace of $A^T$ is just a fancy way of dressing up the "left nullspace" of $A$, since $xA=0$ iff $A^T x^T=0$. The nullspace is now the set of all $x$ such that $xA=0$, and you can draw the same conclusions about solutions to $xA=b$.
In short, these four spaces (really just two spaces, with a left and a right version of the pair) carry all the information about the image and kernel of the linear transformation that $A$ is affecting, whether you are using it on the right or on the left.
Best Answer
One way of seeing the importance of an additive identity is because it allows basic algebraic manipulations to go on. Suppose you have a simple matrix equation: $A+B=C$, and you want to solve for $A$. The steps, in detail, are this:
First, you note that there exists a matrix $-B$, which is the additive inverse of the matrix $B$. If you add this to both sides, you obtain $A+B+(-B)=C+(-B)$. Next, we use the property of additive inverses to simplify $B+(-B)$ into the zero matrix. This gives us $A+O=C+(-B)$. Finally, because $O$ is the additive identity, we can replace $A+O$ with just $A$, yielding $A=C+(-B)$, or in the more usual way of writing it, $A=C-B$.
If you were working in some system where you didn't have an additive identity, then you couldn't perform operations as simple as canceling something with its opposite. We do it so naturally and frequently that we don't always think about it, but an additive identity does a lot of work for us, algebraically.