Indices of the Minkowski metric and Lorentz transformation

matrix elementsmetric-tensorspecial-relativity

I am currently in the process of studying special Relativity and I keep stumbling over a concept I can't make consistent for myself.
It is about the fact which index of a Lorentz transform and the Minkowski metric denotes a column and which one denotes the row.

My thoughts so far:
If I take a look at the Matrix multiplication $\mathbf{x}^{'\nu}=\mathbf{\Lambda}^\nu\,_\mu\cdot\mathbf{x}^\mu$, then the upper index $\mu$ must be the Index indicating a row as $\mathbf{x}$ is a column vector and therefor it's Index must specify the entry (row). Same holds true for the upper index $\nu$ of $\mathbf{x}^{' \nu}$. Additionally I know that the product describes a matrix multiplication with a vector and due to the Einstein sum rule the expression is summed over $\mu$. So if I am interested in the first entry of $\mathbf{x}^{'}=\mathbf{x}^{'0}$ I have to multiply the first row of $\mathbf{\Lambda}=\mathbf{\Lambda}^0$ with the column of $\mathbf{x}$. Therefor the upper index $\nu$ of the Lorentz transformation describes the row of the matrix and the lower index $\mu$ the column. So far so clear.\

Now I encounter some difficulties. For example our professor writes the following: $x_\nu=\eta_{\mu\nu}x^\mu$. (first question: strictly speaking and also written here https://en.wikipedia.org/wiki/Raising_and_lowering_indices (example in Minkowski spacetime): is $x_\nu$ a row vector?). The same reasoning as above cannot be used here as we have no upper/lower index. But after the same logic in $\mathbf{x}^\mu$, $\mu$ describes a row index and therefor in $\eta_{\mu\nu}$ $\nu$ must describe a row again and $\mu$ a column. So now the latter index is the row index (after my understanding). Now the inconsistencies start: In the book here equation (5.12)+(5.13) the authors say that $\Lambda^\mu\,_\alpha \eta_{\mu\nu}\Lambda^\nu\,_\beta$ is not a matrix multiplication as the index $\mu$ is a column index in the first Lorentz transformation as well as in the Minkowski metric.

I also know (very little to be honest) about Tensors and that they play an important role in the lowering and raising of indexes, but there has to be a self consistent answer to my Index problem somewhere. I have yet to find a satisfying answer to my problem and would be grateful for any help you can provide.

Edit: I have found a post where in one answer a reference to another question is given where the author says the left most index indicates the row. That would at least support my claim for the index label of the Lorentz transformation, yet the problem with the Minkowski metric remains.

Best Answer

When tensor notation is first introduced, it can help the beginner student to show matrix equations that do the same thing, so that they can see it is just a linear transformation. A common convention is to have column matrices representing contravariant vectors (the ones with with upper indices), and write a chain of matrix multiplications in tensor notation as $x^\alpha=A^\alpha_\beta B^\beta_\gamma C^\gamma_\delta\dots X^\mu_\nu x^\nu$ sorted into an order where the lower index of each term is the same as the upper index of the following term. But matrices can only handle a handful of cases - where you have vectors or 1-forms that can be written $x^\mu$ or $x_\mu$ respectively, and $\left(\begin{smallmatrix}1\\1\end{smallmatrix}\right)$-tensors $X^\mu_\nu$. Any other sort of tensor, and the analogy breaks. And with matrices you have to get the order right, or again it breaks. A row-vector times a column vector is not the same thing as a column-vector times a row-vector.

So the best thing to do, once the idea has been introduced, is to emphasise to the student that tensors are not the same thing as matrices. After the first week, don't try to interpret everything as matrices or row/column vectors, because in a lot of cases it doesn't work, and it's likely to mislead you.

A tensor, in coordinate form, is better thought of as an $n$-dimensional array of numbers, without worrying about what direction each axis is in. The dimensions are better thought of as first, second, third, ... (upper or lower) index rather than row, column, rising-up-out-of-the-page, ... so we don't get stuck once we get beyond two or maybe three dimensions. The rule for combining them is the Einstein summation convention, which says you sum over each repeated pair of upper/lower indices. That means you can write them in any order, because the index labels will tell you which dimensions to combine. It means you can have arrays of three or four or five dimensions, and not have to worry about how to extend the convention "rows in the first matrix are multiplied by columns in the second matrix" to something that won't fit into a flat 2D page. It means you can combine dimensions in tensors that are not sat next to one another. It's far more powerful and general, and in some ways simpler.

Related Solutions

[Physics] Lorentz invariance of the Minkowski metric

I believe it can be useful to define the following concepts (I won't be very formal here for pedagogical reasons):

Any event can be described through four real numbers, which we take to be: the moment in time it happens, and the position in space where it takes place. We call this four numbers the coordinates of the event. We collect these numbers in a tuple, which we call $x\equiv (t,\boldsymbol r)$. These numbers depend, of course, on which reference frame we are using: we could, for example, use a different origin for $t$ or a different orientation for $\boldsymbol r$. This means: for $x$ to make sense, we must pick a certain reference frame. Call it $S$ for example.

Had we chosen a different frame, say $S'$, the components of the same event would be $x'$, i.e., four real numbers, in principle different from those before. We declare that the new reference frame is inertial if and only if $x'$ and $x$ are related through $$ x'=\Lambda x \tag{1} $$ for a certain matrix $\Lambda$, that depends, for example, on the relative orientations of both reference frames. There are certain conditions $\Lambda$ must fulfill, which will be discussed in a moment.

We define a vector to be any set of four real numbers such that, if its components in $S$ are $v=(v^0,\boldsymbol v)$, then in $S'$ its components must be $$ v'=\Lambda v \tag{2} $$

For example, the coordinates $x$ of an event are, by definition, a vector, because of $(1)$. There are more examples of vectors in physics, for example, the electromagnetic potential, or the current density, the momentum of a particle, etc.

It turns out that it is really useful to define the following operation for vectors: if $u,v$ are two vectors, then we define $$ u\cdot v\equiv u^0 v^0-\boldsymbol u\cdot\boldsymbol v\tag{3} $$ The reason this operation is useful is that it is quite ubiquitous in physics: there are many formulas that use this. For example, any conservation law, the wave equation, the Dirac equation, the energy-momentum relation, etc.

We define the operation $\cdot$ through the components of the vectors, but we know these components are frame-dependent, so if $\cdot$ is to be a well-defined operation, we must have $$ u\cdot v=u'\cdot v' \tag{4} $$ because otherwise $\cdot$ would be pretty useless.

This relation $(4)$ won't be true in general, but only for some matrices $\Lambda$. Thus, we declare that the matrices $\Lambda$ can only be those which make $(4)$ to be true. This is a restriction on $\Lambda$: only some matrices will represent changes of reference frames. Note that in pure mathematics, any invertible matrix defines a change of basis. In physics only a subset of matrices are acceptable changes of basis.

So, what are the possible $\Lambda$'s that satisfy $(4)$? Well, the easier way to study this is to rewrite $(3)$ using a different notation: define $$ \eta=\begin{pmatrix} 1 &&&\\&-1&&\\&&-1&\\&&&-1\end{pmatrix} \tag{5} $$

This is just a matrix that will simplify our discussion. We should not try to find a deep meaning for $\eta$ (it turns out there is a lot of geometry behind $\eta$, but this is not important right now). Using $\eta$, its easy to check that $(3)$ can be written as $$ u\cdot v=u^\mathrm{T}\eta v \tag{6} $$ where in the r.h.s. we use the standard matrix product. If we plug $v'=\Lambda v$ and $u'=\Lambda v$ here, and set $u\cdot v=u'\cdot v'$, we find that we must have $$ \Lambda^\mathrm{T} \eta \Lambda=\eta \tag{7} $$

This is a relation that defines $\Lambda$: any possible change of reference frame must be such that $(7)$ is satisfied. If it is not, the $\Lambda$ cannot relate two different frames. This relation is not in fact a statement of how $\eta$ transforms (as you say in the OP), but actually a restriction of $\Lambda$. It is customary to say that $\eta$ transforms as $(7)$, which will be explained in a moment. For now, just think of $(7)$ as what are the possible matrices $\Lambda$.

At this point, it is useful to introduce index notation. If $v$ is a vector, we call its components $v^\mu$, with $\mu=0,1,2,3$. On the other hand, we write the components of changes of frames $\Lambda^\mu{}_\nu$. With this notation, $(2)$ can be written as $$ v'^\mu=\Lambda^\mu{}_\nu v^\nu \tag{8} $$

Also, using index notation, the product of two vectors can be written as $$ u\cdot v=\eta_{\mu\nu}u^\mu v^\nu \tag{9} $$ where $\eta_{\mu\nu}$ are the components of $\eta$.

Index notation is useful because it allows us to define the following concept: a tensor is an object with several indices, e.g. $A^{\mu\nu}$. But not any object with indices is a tensor: the components of a tensor must change in different frames of reference, such that they are related through $$ \begin{align} &A'^{\mu\nu}=\Lambda^\mu{}_\rho \Lambda^\nu{}_\sigma\ A^{\rho\sigma} \\ &B'^\mu{}_\nu=\Lambda^\mu{}_\rho(\Lambda^\mathrm{T})_\nu{}^\sigma\ B^\rho{}_\sigma\\ &C'^{\mu\nu}{}_\pi{}^\tau=\Lambda^\mu{}_\rho \Lambda^\nu{}_\sigma (\Lambda^\mathrm{T})_\pi{}^\psi \Lambda^\tau{}_\omega\ C^{\rho\sigma}{}_\psi{}^\omega \end{align}\tag{10} $$ and the obvious generalisation for more indices: for every upper index, there is a factor of $\Lambda$, and for every lower index, a factor of $\Lambda^\mathrm{T}$. If the components of an object with indices don't satisfy $(10)$ then that object is not a tensor. According to this definition, any vector is a tensor (with just one index).

I don't like to use index notation too much: $v'=\Lambda v$ is easier that $v'^\mu=\Lambda^\mu{}_\nu v^\nu$, don't you think?. But sometimes we have to use index notation, because matrix notation is not possible: when using tensors with three or more indices, matrices cannot be used. Tensors with one index are just vectors. You'll hear sometimes that matrices are tensors with two indices, which is not quite true: if you remember from your course on linear algebra, you know that when you make a change of basis, matrices transform like $M\to C^\mathrm{T} M C$, which is like $(10)$ in the case of one upper/one lower index. Therefore, matrices are like tensors with one uppe/one lower index. This is the reason we wrote $\Lambda$ as $\Lambda^\mu{}_\nu$. This is a matrix, but it is also a tensor.

Also, $(7)$ pretty much looks like $(10)$, right? This is the reason people say $(7)$ expresses the transformation properties of $\eta$. While not false, I you recommend not to take this too seriously: formally, it is right, but in principle $\eta$ is just a set of numbers that simplifies our notation for scalar products. It turns out you can think of it as a tensor, but only a-posteriori. In principle, it is not defined as a tensor, but it turns out it is. Actually, it is a trivial tensor (the only one!) whose components are the same in every frame of reference (by definition). If you were to calculate what are the components of $\eta$ in another frame of reference using $(10)$, you'll find out that they are the same. This is stated as the metric is invariant. We actually define it to be invariant. We define what a change of reference frame through the restriction of $\eta$ being invariant. It doesn't make sense to try to prove $\eta$ is invariant, as this is a definition. $(7)$ doesn't really prove $\eta$ is invariant, but actually defines what a change of reference is.

For completeness I'd like to make the following definitions:

We say an object is invariant if it takes the same value on any frame of reference. You can check that if $v$ is a vector, then $v\cdot v$ takes the same value on any frame, i.e., $v^2$ is invariant.
We say an object is covariant if it doesn't take the same value on every frame of reference, but the different values are related in a well defined way: the components of a covariant object must satisfy $(10)$. This means tensors are covariant by definition.

For example, a vector is not invariant because its components are frame-dependent. But as vectors are tensors, they are covariant. We really like invariant objects because they simplify a lot of problems. We also like covariant objects because, even though these objects are frame-dependent, they transform in a well-defined way, making them easy to work with. You'll understand this better after you solve many problems in SR and GR: in the end you will be thankful for covariant objects.

So, what does it mean for $\eta$ to be invariant? It means its components are the same in every (inertial) frame of reference. How do we prove this? we actually can't, because we define this to be true. How can we prove $\eta$ is the only invariant tensor? We can't, because it is not actually true. The most general invariant tensor is proportional to the metric. Proof: let $N^\mu{}_\nu$ be an invariant tensor by definition. Then, as it is a tensor, we have $$ N'=\Lambda^\mathrm{T}N\Lambda \tag{11} $$

But we also must have $N'=N$ for it to be invariant. This means $\Lambda^\mathrm T N\Lambda=N$. Multiply on the right by $\eta \Lambda^\mathrm{T} \eta$ and use $(7)$ to get $[N,\Lambda^\mathrm{T}]=0$. By Shur's Lemma, $N$ must be proportional to the identity. QED.

And what about the Levi-Civita symbol? we are usually told that it is also an invariant tensor, which is not actually true: it is invariant, but it is not a tensor, it is a pseudo-tensor. In SR it doesn't satisfy $(10)$ for any $\Lambda$, but only for a certain subset of matrices $\Lambda$ (check Proper Orthochronus Lorentz Group), and in GR it is a tensor density (discussed in many posts on SE).

The proof of the covariance of the LC symbol is usually stated as follows (you'll have to fill in the details): the definition of the determinant of a matrix is can be stated as $\text{det}(A)\varepsilon^{\mu\nu\sigma\rho}=\varepsilon^{abcd}A^\mu{}_a A^\nu{}_b A^\rho{}_c A^\sigma{}_d$. The proper Orthochronus Lorentz Group consists of the subset of matrices with unit determinant, i.e., $\text{det}(\Lambda)=1$. If you use this together with the definition of $\text{det}$, you get $\varepsilon^{\mu\nu\rho\sigma}=\varepsilon^{abcd}\Lambda^\mu{}_a\Lambda^\nu{}_b\Lambda^\rho{}_c\Lambda^\sigma{}_d$, which is the same as $(10)$ for the object $\varepsilon^{\mu\nu\rho\sigma}$. This proves that, when restricted to this subset of the Lorentz Group, the Levi-Civita symbol is a tensor.

Raising and Lowering indices: this is something that is usually made more important that it really is. IMHO, we can fully formulate SR and GR without even mentioning raising and lowering indices. If you define an object with its indices raised, you should keep its indices where they are. In general there is no good reason as why would someone want to move an index. That being said, I'll explain what these are, just for completeness.

The first step is to define the inverse of the metric. Using matrix notation, the metric is its own inverse: $\eta \eta=1$. But we want to use index notation, so we define another object, call it $\zeta$, with components $\zeta^{\mu\nu}=\eta_{\mu\nu}$. With this, you can check that $\eta\eta=1$ can be writen as $\eta_{\mu\nu}\zeta^{\nu\rho}=\delta^\mu_\rho$, where $\delta$ is the Kronecker symbol. For now, $\delta$ is just a symbol that simplifies the notation. Note that $\zeta$ is not standard notation, but we will keep it for the next few paragraphs.

(People usually use the same letter for both $\eta$ and $\zeta$, and write $\eta_{\mu\nu}=\eta^{\mu\nu}$; we'll discuss why in a moment. For now, note that these are different objects, with different index structure: $\eta$ has lower indices and $\zeta$ has upper indices)

We can use $\eta$ and $\zeta$ to raise and lower indices, which we now define.

Let's say you have a certain tensor $A^{\mu\nu}{}_\rho$. We want to define what it means to raise the index $\rho$: it means to define a new object $\bar A$ with components $$ \bar A^{\mu\nu\rho}\equiv \zeta^{\rho\sigma}A^{\mu\nu}{}_\sigma \tag{12} $$ (this is called to raise the index $\rho$ for obvious reasons)

Using $(10)$ you can prove that this new object is actually a tensor. We usually drop the bar $\bar{\phantom{A}}$ and write $A^{\mu\nu\rho}$. We actually shouldn't do this: these objects are different. We can tell them apart from the index placement, so we relax the notation by not writing the bar. In this post, we'll keep the bar for pedagogical reasons.

In an analogous way, we can lower an index, for example the $\mu$ index: we define another object $\tilde A$, with components $$ \tilde A_\mu{}^\nu{}_\rho\equiv \eta_{\mu\sigma} A^{\sigma\nu}{}_\rho \tag{13} $$ (we lowered $\mu$)

This new object is also a tensor. The three objects $A,\bar A,\tilde A$ are actually different, but we can tell them apart through the indices placement, so we can drop the tildes and bars. For now, we won't.

We'll discuss the usefulness of these operations in a moment. For now, note that if you raise both indices of the metric, you get $$ \bar{\bar{\eta}}^{\mu\nu}\equiv\zeta^{\mu\rho}\zeta^{\nu\sigma} \eta_{\rho\sigma}=\zeta^{\mu\rho}\delta^\nu_\rho=\zeta^{\mu\nu} \tag{14} $$ which means that $\bar{\bar{\eta}}=\zeta$. As we usually drop the bars, this means that we can use the same letter $\eta$ for both objects. In principle, they are different: $\eta_{\mu\nu}$ is the metric, and $\zeta^{\mu\nu}$ is its inverse. In practice, we use $\eta_{\mu\nu}$ and $\eta^{\mu\nu}$ for both these objects, and even call them both metric. From now on, we will use $\eta$ both for the metric and its inverse, but we keep the bars for other objects.

With this in mind, we get the following important result: $$ \eta_{\mu\nu}\eta^{\nu\rho}=\delta_\mu^\rho \tag{15} $$ which is actually a tautology: it is the definition of the inverse of the metric.

So, what is the use of these operations? for example, what do we get if we lower the index of a vector $v$? Well, we get a new tensor, but it is not a vector (you can check that $(2)$ is not satisfied), so we call it a covector. This is not really important in SR, but in other branches of physics vectors and covectors are really really different.

So, what is the covector associated to $v$? Call this covector $\bar v$. Its components will be $\bar v_\mu=\eta_{\mu\nu} v^\nu$ by definition. Why is this useful? Well, one reason is that by lowering an index, the scalar product $\cdot$ turns into standard matrix product: $$ u\cdot v=\bar u v \tag{16} $$ as you can check (compare this to $(3)$ or $(6)$). So in principle, raising and lowering indices is supposed to simplify notation. Actually, in the end, you'll see that people write $uv$ instead of $u\cdot v$ or $u_\mu v^\mu$. So you see that the notation is simplified without the need of raising/lowering any index.

The following fact is rather interesting: we know that if we raise both indices of the metric we get the metric again. But what do we get if we raise only one index to the metric? that is, what is $\bar \eta$?, or, put it another way, what is $\eta^\mu{}_\nu$? Well, according to the definition, it is $$ \eta^\mu{}_\nu=\eta_{\nu\rho}\eta^{\mu\rho}=\delta^\mu_\nu \tag{17} $$ where I used $(15)$. This means that $\bar \eta=\delta$: the metric is the same object as the Kronecker symbol, which is a cool result. As we know that raising and lowering indices from a tensor results in a new tensor, we find that the Kronecker symbol is actually a tensor! We can even prove this from the definition of tensors, i.e., we can check that $(10)$ is satisfied for $\delta$. But we don't need to: we know that it must be true (check it if you want to).

As a side note: you (as many people) write prime marks on the indices, while I (as many others) write the primes on the tensors. IMHO the latter convention is the best, because it is the tensor what is changing, not the indices. For example, what you wrote $\eta_{\mu'\nu'}=\eta_{\mu\nu}$ looks better when written $\eta'_{\mu\nu}=\eta_{\mu\nu}$, because the $\mu\nu$ component of both objects are equal, and not the $\mu'$ is equal to the $\mu$ component (which actually makes no sense and makes the indices mismatched).

[Physics] Staggered Indices ($\Lambda^\mu{}_\nu$ vs. $\Lambda_\mu{}^\nu$) on Lorentz Transformations

By convention, vectors are written as column vectors, whereas dual vectors are written as row vectors. This means that in principle, upper indices should index columns and lower indices should index rows. However, in practice, we normally translate rank-2 tensors to matrices by order of the indices, the first one indexing rows, the second one indexing columns.

The only way I can think of to make this translation from tensors to matrices structurally well-defined (which I've never seen done in the literature), is to force all rank-2 tensors into the form $\cdot\;^\mu{}_\nu$, which can be achieved by contraction with appropriate 'Kronecker tensors', by which I mean rank-2 tensors whose components are 1 if the indices agree and 0 otherwise.

Let's call these tensors $\overline\delta^{\mu\nu}$ and $\underline\delta_{\mu\nu}$.

Then, the matrix product given in your question $$ x^T\cdot\eta\cdot y $$ would translate to $$ \left(x^\mu\underline\delta_{\mu\nu}\right)\cdot\left(\overline\delta^{\nu\alpha}\,\eta_{\alpha\beta}\right)\cdot\left(y^\beta\right) $$ The first term has a single free lower index (aka a row vector), the second term a free upper and lower index (aka a matrix) and the third one a free upper index (aka a column vector).

As all Kronecker tensors can be removed through index adjustement, this is equivalent to the far simpler expression $$ x^\mu\,\eta_{\mu\beta}\,y^\beta $$

As you can see, while there is no special symbol for transposition in index notation - it is normally implied by which index is summed over - it could be made explicit by using the 'Kronecker tensors' - but all you'd gain is adding unnecessary complexity.

Now after this round of useless musings, let's get back to something that actually is important when reading literature:

Indices are lowered and raised by contraction with the metric tensor and its inverse. So for example given a tensor $A^\mu{}_\nu$, then $$ A_\mu{}^\nu \equiv A^\alpha{}_\beta\; \eta_{\alpha\mu}\; (\eta^{-1})^{\beta\nu} $$

For the metric tensor itself, we have $$ (\eta^{-1})^{\mu\nu} = \eta^{\mu\nu} $$ proven over here and for Lorentz transformations $$ (\Lambda^{-1})^\tau{}_\mu = \Lambda_\mu{}^\tau $$ proven over here.

This is a special property of these specific tensors and does not hold for arbitrary ones.

Best Answer

Related Solutions

[Physics] Lorentz invariance of the Minkowski metric

[Physics] Staggered Indices ($\Lambda^\mu{}_\nu$ vs. $\Lambda_\mu{}^\nu$) on Lorentz Transformations

Related Question