1.) What are poles and zeros of linear system {A,B,C,D} exactly? What does it mean for a system to have a pole at a certain value, or a zero at certain value?
Intuitively, I do not know exactly what poles or zeros are. All I know is that the poles are roots of the denominator of the transfer function, or the eigenvalues of the $A$ matrix, like the one in your question. Poles show up explicitly in the solutions of ordinary differential equations, and an example of this can be seen here:
http://www.math.oregonstate.edu/home/programs/undergrad/CalculusQuestStudyGuides/ode/laplace/solve/solve.html
So what kind of question can we answer using information about poles?
i) Is the system stable?
ii) If it is stable, is the response of the system oscillatory, is it like a rigid body?
iii) If it is unstable, is it possible to stabilize this system using output feedback? (you need information about zeros here)
Now, let's talk about zeros. Zeros show up in literature because it has an effect on the behavior of control systems.
i) They impose fundamental limitations on the performance of control systems.
ii) In adaptive control systems, zeros can cause your adaptive controller to go unstable.
iii) They tell you about the "internal stability" of a control system.
As far as I can tell, zeros are more subtle than poles. I cannot say I fully understand them.
2.) The author writes about 'poles of a transfer function matrix H(s)'. What is a transfer function matrix? The only thing I know is how to compute it and that it describes some relation between input/output of the system. But why do we need tranfser function matrices?
Taking the Laplace transform of a differential equation that has a single-input and a single-output yields a transfer function. An example of this is in the link above. A transfer function describes the relationship between a single output and a single input. So if you have a system of differential equations that has, say, 2 inputs and 3 outputs, then a transfer matrix is a matrix of transfer functions that contains 6 elements. Each individual element describing the relationship between one of the inputs and one of the outputs. (The superposition principle plays a big role here)
But why would one want a transfer matrix. I believe it is because calculating zeros for a multi-input multi-output system is not easy. Here is an article that talks about all the different kinds of zeros and why they are important:
http://www.smp.uq.edu.au/people/YoniNazarathy/Control4406/resources/HoaggBernsteinNonMinimumPhaseZero.pdf
3.) To calculate the poles and zeros, the author says that we need the Smith and Smith-McMillan Forms. These are matrices that have only diagonal entries. What is exactly the algorithm to calculate the Smith-(McMillan)-form of a transfer matrix?
Sorry. I don't have much on this one.
4.) What is the relation between the poles of a system and the controllability, observability, stability and stabilizability ? The same for a zero ?
For me, poles and zeros are important to transfer functions, which describe the relationship between inputs and outputs, and they can tell you about stabilizability and stability. However, concepts like controllability and observability are state space concepts (At least for me). If you write a transfer function in state space form, as you have written in your question, then there is a very simple test for controllability and observability. You can find more about this in almost any course, for example in Stephen Boyd's introductory control course at Stanford.edu.
5.) What is an invariant zero polynomial of the system {A,B,C,D} ?
A SISO system just has one kind of zero. A MIMO system has many kinds of zeros, one of which is an invariant zero. The roots of the invariant zero polynomial gives you invariant zeros. It makes me kind of sad that I do not know very much about zeros of MIMO systems.
6.) What is 'a realization of a system'?
Let's say you start off with a differential equation. Then you take its Laplace transform, and obtain a transfer function. Then, for this transfer function, there are an infinite number of state space representations. That is, there are an infinite number of matrices $A, B, C, D$ that yield the same input-output relationship as the original transfer function. These representations are called realizations. We can go from one realization to another using "Similarity Transformations".
7.) Where can I find more good information about this subject?
If you are a mathematician, then you should probably look for a more mathematical text on control systems. Most engineers use a classical control book ( like the one by Ogata ) in undergrad, which is mostly about transfer functions, zeros, poles, and various stability tests. Then, in grad school, engineers take a course called "Linear Systems Theory", where they learn about State Space theory of control systems. The book I used was by "Chen", but I did not like it very much.
Best Answer
A general PID controller has the transfer function
$$K(s) = \underbrace{K_\text{P}}_{\text{proportional term}} + \underbrace{K_\text{I}\dfrac{1}{s}}_{\text{integral term}}+\underbrace{K_\text{D}s}_{\text{derivative term}}.$$
The proportional term is scaling the error $e$ between the desired output $y_\text{d}$ and the actual value of the output $y$ by the constant $K_\text{P}$ to generate an input to the plant. You can see this as the term that reacts to the present state of the error. As a rule of thumb, you can try to remember that higher $K_\text{P}$ values will make the controller react faster to deviations from the desired output, which might lean to unstable system behaviour. Imagine you are driving a car and as soon as you see that you are deviating from the middle of the road you react by steering very violently (high $K_\text{P}$) to the opposite direction. You can imagine that after a short time you will be deviating from the middle of the road but to the opposite direction, then you again react very violently into the other direction. I hope you can imagine that such an aggressive proportional reaction might lead to undesired dynamics of the car.
The integral term accumulates the error by integrating it over the past values of the error. It will generate an increasing input to the plant if the error is not vanishing. Hence, the integral term can be viewed as the term that takes into account the history/past of the error. An example for this is when you are standing under the shower. You start to open the valve for warm water if you see that the water is still not warm enough (error integrated over a time period) you open the valve for even more warm water. Two main characteristics of the integral term are that helps you get a better steady-state error but at the same time, it can lead to instability and increased oscillations. There is also the problem of integrator windup which is the reason why you should not use a standard PID controller (with the integral term) without an anti-windup.
The derivative term is reacting to changes of the error. The derivative term can be used to prevent oscillations because of the integral term. It can be viewed as a term that is reacting to the future of the signal. In practice, the PID controller is not implemented as I have written it because it is not a proper transfer function. The derivative term should not be used if it possible to achieve good performance without it as the derivative term can generate a lot of noise if your measurements are very noisy.
A PI-Controller is simply obtained if you do not use the derivative term $K_\text{D}=0$.
The reason why we are interested in the transfer functions that you have written is that they represent different input to output transfer functions.
See this following control circuit (adapted from]1)
The first transfer function is the reference $r$ to the error $e$ transfer function. It is also called the sensitivity function
$$S(s) = \dfrac{e(s)}{r(s)}=\dfrac{1}{1+K(s)G(s)}$$
the fourth function is called complementary sensitivity function (I think it is also called the transmissibility function)
$$T(s) = 1-S(s)=\dfrac{y(s)}{r(s)}=\dfrac{K(s)G(s)}{1+K(s)G(s)}$$
in which $y$ is the output.
The second transfer function is the transfer function from an additive disturbance between the controller and the plant $d$ and the output $y$. It is called disturbance transfer function
$$\dfrac{y(s)}{d(s)}=\dfrac{G(s)}{1+K(s)G(s)}.$$
The third transfer function is relating the reference $r$ to the controller output $u$. I am not sure how it is called in English, but it can be translated from German as actuation transfer function
$$\dfrac{u(s)}{r(s)}=\dfrac{K(s)}{1+K(s)G(s)}.$$
As you can see the stability of the closed-loop system is always dictated by $1+K(s)G(s)$. The study of these transfer function gives you additional information, for example, how does an additive disturbance between controller and plant manifest itself in the output ($y(s)/d(s)$). Or how does the actuation / control effort $u$ depend on the reference $r$ ($u(s)/r(s)$).