Electromagnetism – How to Derive The Hamilton’s Equations of a Charged Particle in an Electromagnetic Field

classical-mechanicselectromagnetismhamiltonianhamiltonian-formalismphase-space

For the relativistic charged particle in EM field we have the following equation for the hamiltonian
$$H\left( {\vec r,\vec P,t} \right) = c\sqrt {{m^2}{c^2} + {p^2}} + e\varphi = c\sqrt {{m^2}{c^2} + {{\left( {\vec P – e\vec A\left( {\vec r,t} \right)} \right)}^2}} + e\varphi. $$

Then the hamiltonian's equations of motion can be written as

$$\frac{{d\vec r}}{{dt}} = \frac{{\partial H}}{{\partial \vec P}} = \frac{{c\vec p}}{{\sqrt {{m^2}{c^2} + {p^2}} }}$$

and

$$\frac{{d\vec P}}{{dt}} = – \frac{{\partial H}}{{\partial \vec r}} = \vec ve\frac{{\partial \vec A}}{{\partial \vec r}} – e\frac{{\partial \phi }}{{\partial \vec r}}$$

Where ${\vec P}$ is the generalised momentum. I don't understand why the generalized momentum is used in the last equation instead of the ordinary momentum.

Futhermore, the expression for the ordinary momentum ${\vec p}$ is obtained and has such a form

$$\frac{{d\vec p}}{{dt}} = – e\frac{{\partial \vec A}}{{\partial t}} – e\frac{{\partial \phi }}{{\partial \vec r}} + e\left( {\vec v\frac{{\partial \vec A}}{{\partial \vec r}} – \frac{{\partial \vec A}}{{\partial \vec r}}\vec v} \right)$$

And it is not clear for me why the last term in this equation $e\left( {\vec v\frac{{\partial \vec A}}{{\partial \vec r}} – \frac{{\partial \vec A}}{{\partial \vec r}}\vec v} \right)$ is not
zero and what ${\frac{{\partial \vec A}}{{\partial \vec r}}}$ means? If ${\vec A}$ was a scalar it would be just a gradient value but the vector quantity confuses me

Best Answer

Short answer

I don't understand why the generalized momentum is used in the last equation instead of the ordinary momentum.

The generalized momentum $\vec P$ is used because Hamilton's equations of motion relate the time-derivative of the generalized momentum $d \vec P/dt$ -- not the time-derivative of the kinetic momentum $d \vec p/dt$ -- to the negative partial derivatives of the Hamiltonian with respect to the generalized position $-\partial H/\partial \vec r$.

And it is not clear for me why the last term in this equation $e\left( {\vec v\frac{{\partial \vec A}}{{\partial \vec r}} - \frac{{\partial \vec A}}{{\partial \vec r}}\vec v} \right)$ is not zero and what ${\frac{{\partial \vec A}}{{\partial \vec r}}}$ means? If ${\vec A}$ was a scalar it would be just a gradient value but the vector quantity confuses me

Correct. If $\vec A$ were instead a scalar field, that term would denote the gradient of a scalar field. It turns out that we can apply the concept of a gradient not only to scalars but also to vectors and, more generally, tensors, a class of geometric objects to which scalars, or zeroth-order tensors, and vectors, or first-order tensors, belong. As the gradient of a zeroth-order tensor yields a first-order tensor, you might guess that the gradient of a first-order tensor yields a second-order tensor, a geometric object which can be represented using an $n \times n$ matrix; you'd be correct, and while it is not as clear in the notation you have chosen, the fact that $\partial \vec A/\partial \vec r$ -- the gradient of the vector field $\vec A$ -- is a second-order tensor is exactly the reason why $\vec v (\partial \vec A/\partial \vec r) - (\partial \vec A/\partial \vec r)\vec v \neq 0$.

In fact, just by inspection, you should be able to at least convince yourself that since

$$- e\frac{\partial \vec A}{\partial t} - e\frac{\partial \phi}{\partial \vec r} = e \left(-\frac{\partial \vec A}{\partial t} - \frac{\partial \phi}{\partial \vec r}\right) = e \vec E$$

it must be that $\vec v (\partial \vec A/\partial \vec r) - (\partial \vec A/\partial \vec r)\vec v = \vec v \times \vec B$, since the right-hand side of the equation for $d \vec p/dt$ should yield the correct expression for the Lorentz force.

Long answer

Now let's prove our conviction.

Tensors in $\mathbb R^3$

Consider a vector basis $\{\mathbf e_i\}$, where the index $i$ ranges from 1 to 3, and for simplicity, assume that this vector basis is Euclidean; in other words, the inner product of basis vectors $\mathbf e_i$ and $\mathbf e_j$ gives,

$$\mathbf e_i \cdot \mathbf e_j = \delta_{ij} \tag{1}$$

where

$$\left(\delta_{ij}\right) = \left(\begin{matrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{matrix}\right) \tag{2}$$

is the identity matrix.

A vector $\mathbf f$ may be expressed in this coordinate basis as,

$$\mathbf f = \sum_{i=1}^3 f_i\, \mathbf e_i = \left(\begin{matrix} f_1 & f_2 & f_3 \end{matrix}\right)_{\{\mathbf e_i\}} \tag{3}$$

while a second-order tensor $\mathbf F$ may be expressed as,

$$\mathbf F = \sum_{i=1}^3 \sum_{j=1}^3 F_{ij}\, \mathbf e_i \otimes \mathbf e_j = \left(\begin{matrix} F_{11} & F_{12} & F_{13} \\ F_{21} & F_{22} & F_{23} \\ F_{31} & F_{32} & F_{33} \end{matrix}\right)_{\{\mathbf e_i \otimes \mathbf e_j\}} \tag{4}$$

where $\mathbf e_i \otimes \mathbf e_j$ is called the outer product of $\mathbf e_i$ and $\mathbf e_j$. The outer product is defined such that, given vectors $\mathbf a$, $\mathbf b$, $\mathbf c$, and $\mathbf d$,

$$(\mathbf a \otimes \mathbf b) \cdot (\mathbf c \otimes \mathbf d) = (\mathbf b \cdot \mathbf c)(\mathbf a \otimes \mathbf d) \tag{5}$$

or, equivalently,

$$(\mathbf a \otimes \mathbf b) \cdot \mathbf c = (\mathbf b \cdot \mathbf c)\mathbf a$$ $$\mathbf c \cdot (\mathbf a \otimes \mathbf b) = (\mathbf c \cdot \mathbf a)\mathbf b \tag{6}$$

The transpose of $\mathbf a \otimes \mathbf b$, denoted as $(\mathbf a \otimes \mathbf b)^T$, is defined as,

$$(\mathbf a \otimes \mathbf b)^T = \mathbf b \otimes \mathbf a \tag{7}$$

and so (6) may be rewritten as,

$$(\mathbf a \otimes \mathbf b) \cdot \mathbf c = \mathbf c \cdot (\mathbf a \otimes \mathbf b)^T = (\mathbf b \cdot \mathbf c)\mathbf a$$ $$\mathbf c \cdot (\mathbf a \otimes \mathbf b) = (\mathbf a \otimes \mathbf b)^T \cdot \mathbf c = (\mathbf c \cdot \mathbf a)\mathbf b \tag{8}$$

Additionally, applying (7) to (4), the transpose of $\mathbf F$, denoted $\mathbf F^T$, is,

$$\mathbf F^T = \sum_{i=1}^3 \sum_{j=1}^3 F_{ij}\, \mathbf e_j \otimes \mathbf e_i = \sum_{i=1}^3 \sum_{j=1}^3 F_{ji}\, \mathbf e_i \otimes \mathbf e_j = \left(\begin{matrix} F_{11} & F_{21} & F_{31} \\ F_{12} & F_{22} & F_{32} \\ F_{13} & F_{23} & F_{33} \end{matrix}\right)_{\{\mathbf e_i \otimes \mathbf e_j\}} \tag{9}$$

If the second-order tensor $\mathbf F^T = \mathbf F$, then $\mathbf F$ is said to be symmetric; if $\mathbf F^T = -\mathbf F$, then $\mathbf F$ is said to be antisymmetric.

A third-order tensor $\mathbf \Phi$ may be expressed as,

$$\mathbf \Phi = \sum_{i=1}^3 \sum_{j=1}^3 \sum_{k=1}^3 \Phi_{ijk}\, \mathbf e_i \otimes \mathbf e_j \otimes \mathbf e_k \tag{10}$$

One commonly-encountered third-order tensor known as the alternating tensor, denoted as $\mathbf \epsilon$, is defined as,

$$\mathbf \epsilon = \sum_{i=1}^3 \sum_{j=1}^3 \sum_{k=1}^3 \epsilon_{ijk}\, \mathbf e_i \otimes \mathbf e_j \otimes \mathbf e_k \tag{11}$$

such that $\mathbf \epsilon$ is antisymmetric under an exchange of any two indices (e.g $\epsilon_{jik} = -\epsilon_{ijk}$) and $\epsilon_{123} = 1$. Note that this implies that any component $\epsilon_{ijk}$ that has two or more indices set to the same value is equal to zero (e.g $\epsilon_{112} = 0$).

A common practice in many publications is to omit the summation symbols found in the expressions for the tensors given above; this is known as the Einstein summation convention, and this convention will be used from this point onward. The rules of the Einstein summation convention are as follows:

  1. In a term which involves only the product of tensor components, if an index $i$ only appears once in the term, then $i$ is referred to as a free index, and a summation on that index is not implied.
  2. In a term which involves only the product of tensor components, if an index $i$ appears twice in the term, then $i$ is referred to as a summation index, and a summation on that index is implied.
  3. For any index $i$, you may denote it using any other letter that is not already being used as an index in the same term.
  4. In order to avoid ambiguity, no index can appear more than twice in the same term.

Relevant Tensor Operations

The scalar product of two vectors $\mathbf a$ and $\mathbf b$, denoted $\mathbf a \cdot \mathbf b$, is given by,

$$\mathbf a \cdot \mathbf b = a_i b_j\, \mathbf e_i \cdot \mathbf e_j = a_i b_j \delta_{ij} = a_i b_i \tag{12}$$

Similarly, the inner product for a vector $\mathbf a$ and a second-order tensor $\mathbf B$ is given by,

$$(\mathbf B \cdot \mathbf a)_i = B_{ij} \delta_{jk} a_k = B_{ij}a_j $$ $$(\mathbf a \cdot \mathbf B)_i = a_k \delta_{kj} B_{ji} = B_{ji}a_j \tag{13}$$

where $(\mathbf B \cdot \mathbf a)_i$ and $(\mathbf a \cdot \mathbf B)_i$ are, respectively, the $i$-th components of the vectors $\mathbf B \cdot \mathbf a$ and $\mathbf a \cdot \mathbf B$.

The vector product of two vectors $\mathbf a$ and $\mathbf b$, denoted $\mathbf a \times \mathbf b$, is given by,

$$(\mathbf a \times \mathbf b)_i = \epsilon_{ijk} a_j b_k \tag{14}$$

where $(\mathbf a \times \mathbf b)_i$ is the $i$-th component of $\mathbf a \times \mathbf b$. Similarly, the curl of a vector $\mathbf a$ is given by,

$$(\nabla \times \mathbf a)_i = \epsilon_{ijk} \frac{\partial a_k}{\partial x_j} \tag{15}$$

where $x_j$ is the $j$-th component of the position vector $\mathbf r$ relative to the origin of our coordinate basis.

The gradient of a vector $\mathbf a$, a second-order tensor denoted $\nabla \mathbf a$, is defined as,

$$(\nabla \mathbf a)_{ij} = \frac{\partial a_i}{\partial x_j} \tag{16}$$

where $(\nabla \mathbf a)_{ij}$ is the component of $\nabla \mathbf a$ associated with the outer product $\mathbf e_i \otimes \mathbf e_j$.

A Note About Antisymmetric Second-Order Tensors

If a second-order tensor $\mathbf F$ is antisymmetric (i.e. $F_{ji} = -F_{ij}$), then there exists some scalars $f_k$ such that,

$$F_{ij} = \epsilon_{ijk} f_k \tag{17}$$

or, equivalently,

$$f_k = \frac{1}{2}\epsilon_{ijk}F_{ij} \tag{18}$$

and the vector $\mathbf f$ for which $f_k$ is the $k$-th component is called the axial vector of $\mathbf F$. Note that the definition for the vector product in (14) also involves the components of the alternating tensor. This is no accident, as the vector product in $\mathbb R^3$ between two vectors $\mathbf a$ and $\mathbf f$ is actually the result of an inner product between an antisymmetric tensor $\mathbf F$ and a vector $\mathbf a$,

$$F_{ij}a_j = \epsilon_{ijk} a_j f_k \tag{19}$$

The Lorentz Force

Now we are in a position to consider the equation you have written for $d \vec p/dt$ using clearer notation,

$$\begin{align} \frac{d\mathbf p}{dt} & = e\left[\left(-\frac{\partial \mathbf A}{\partial t} - \nabla \phi \right) + \mathbf v \cdot \nabla \mathbf A - \nabla \mathbf A \cdot \mathbf v\right] \\ & = e\left(\mathbf E + \mathbf v \cdot \nabla \mathbf A - \nabla \mathbf A \cdot \mathbf v\right) \end{align} \tag{20}$$

Let's rewrite this in summation notation:

$$\begin{align} \frac{dp_i}{dt} & = e\left(E_i + v_l \delta_{lj}\frac{\partial A_j}{\partial x_i} - \frac{\partial A_i}{\partial x_j} \delta_{jl} v_l\right) \\ & = e\left(E_i + \left(\frac{\partial A_j}{\partial x_i} - \frac{\partial A_i}{\partial x_j}\right) v_j\right) \end{align} \tag{21}$$

The term $(\partial A_j/\partial x_i) - (\partial A_i/\partial x_j)$ is clearly the component $F_{ij}$ of an antisymmetric second-order tensor $\mathbf F$, and so the components of its corresponding axial vector $f_k$ are,

$$\begin{align} f_k & = \frac{1}{2} \epsilon_{ijk} \left(\frac{\partial A_j}{\partial x_i} - \frac{\partial A_i}{\partial x_j}\right) \\ & = \frac{1}{2} \left(\epsilon_{ijk}\frac{\partial A_j}{\partial x_i} - \epsilon_{ijk}\frac{\partial A_i}{\partial x_j}\right) \\ & = \frac{1}{2} \left(\epsilon_{kij}\frac{\partial A_j}{\partial x_i} + \epsilon_{kji}\frac{\partial A_i}{\partial x_j}\right) \\ & = \frac{1}{2} \left[\left(\nabla \times \mathbf A\right)_k + \left(\nabla \times \mathbf A\right)_k\right] \\ & = \left(\nabla \times \mathbf A\right)_k \\ & = B_k \end{align} \tag{22}$$

Thus,

$$\begin{align} \frac{dp_i}{dt} & = e\left(E_i + \left(\frac{\partial A_j}{\partial x_i} - \frac{\partial A_i}{\partial x_j}\right) v_j\right) \\ & = e\left(E_i + \epsilon_{ijk} B_k v_j\right) \\ & = e\left(E_i + \left(\mathbf v \times \mathbf B\right)_i \right) \\ \end{align} \tag{23}$$

A Final Note on the Tensor Formulation of Electromagnetism

When studying electromagnetic phenomena in the context of relativity, due to the non-Euclidean nature of the geometry of spacetime, we cannot simplify our mathematical analysis by working in a 3-dimensional Euclidean basis; however, since tensors are geometric objects that exist independently of any coordinate basis used to describe them, a similar analysis can be performed to yield the correct result, and in Minkowski spacetime you will end up constructing a $4 \times 4$ antisymmetric tensor of the form,

$$F_{\mu \nu} = \partial_{\mu}A_{\nu} - \partial_{\nu}A_{\mu} = \left(\begin{matrix} 0 & -E_x/c & -E_y/c & -E_z/c \\ E_x/c & 0 & B_z & -B_y \\ E_y/c & -B_z & 0 & B_x \\ E_z/c & B_y & -B_x & 0 \\\end{matrix}\right) \tag{24}$$

more commonly know as the electromagnetic field tensor, and the Lorentz Force Law for a charge $q$ will take the form of

$$\frac{dp_\mu}{d \tau} = qF_{\mu\nu}U^{\nu} \tag{25}$$

where $p_\mu$ are the covariant components of the charge's four-momentum, $\tau$ is the proper time experienced by the charge, and $U^{\nu}$ are the contravariant components of the charge's four-velocity.

Related Question