Indexing Variables

notationsoft-question

When picking variables it is common to use prime marks, as in $a, a',a'', …$, and numbers $a_1, a_2, a_3, …$. A third option is to use distinct letters $a, b, c, …$. For example, some people write an exact sequence of $R$-modules as $0 \rightarrow N \rightarrow M \rightarrow L \rightarrow 0$, whereas some prefer to write $0 \rightarrow M' \rightarrow M \rightarrow M'' \rightarrow 0$, others $0 \rightarrow M_1 \rightarrow M_2 \rightarrow M_3 \rightarrow 0$.

Such conventions help people to read and write mathematics quickly. Readers of each proof must memorize these notations, and so which convention to pick is worth some thought. Hence I am collecting explanations as to why people prefer one of these conventions over the others.

For example, it may be relevant to note that the use of prime marks as in $a, a', a''$ seems to privilege $a$, while the use of different letters $a, b, c$ does not seem to do this. This would suggest that $a, a', a''$ be used in a situation where $a$ is the more essential variable.

A related interest I have is in arguments that one might prefer to index sequences starting from $0$ or $1$. Dijkstra gives an argument explaining why, after contemplation, many prefer to start with $0$:

https://www.cs.utexas.edu/users/EWD/transcriptions/EWD08xx/EWD831.html

That's interesting, but now I'm wondering about the three conventions above. So, are there reasons one might prefer one of these three conventions- prime marks, numbers, or distinct letters?

Best Answer

A few thoughts on each. I'm not a professional mathematician, just a learner who did a mathematically-heavy subject at university, but these are the main considerations that come to my mind:

Prime marks:

difficult to read if you need more than about $3$ of them ($4$ at a push).
can be misread as indicating derivatives if there's calculus going on.
don't mix well with indices, especially when handwritten: eg $a'^2$ ends up needing to be $(a')^2$ for legibility, when it could just be $b^2$.
nice and concise otherwise.
I tend to expect $a'$ to be the result of doing something to $a$.

Subscripts:

can get messy if attached to a symbol that appears very frequently, appears in an exponent, or is raised to a power.
can make an argument look more complicated than it really is, if they do get messy.
can be numbered as high as you like without losing legibility.
can be in the form of expressions, e.g. $a_{n+1}$, giving more flexibility.
often emphasise that you're dealing with an arbitrarily large set of instances of something: e.g. coefficients in a series or terms in a sequence.

Letters:

can be chosen for intuitive ease of understanding, especially ones that are conventional (like $t$ for time, or for the independent variable in a parametric equation).
are clearly distinct from each other. (Compare solving a set of equations in $x, y$ and $z$ with solving the same equations but in $x_1,x_2$ and $x_3$.)
but if you need a lot of them, you soon have to start using unsuitable ones because the obvious ones aren't available.
a few, like $e$, can cause at least momentary confusion because they're used as the usual name for something.

There are probably also considerations about ease of typesetting if the material is for publication—I've no experience of that, but the standard reference (in the UK) on copy-editing includes a long section on editing mathematics.

Response to a comment, which is too long to be a comment

As for the case $x_{l,\,u}^{(j)}$, See Chapter 6 for enlightenment on that.

In Sec 6.1's inline equation $a_{l,\,u}=\mathbf{w}_{l,\,u}\mathbf{z}+b_{l,\,u}$, we can rewrite the first term as $\sum_kw_{l,\,u,\,k}z_k$ or $\sum_kw_{k,\,l,\,u}z_k$, so we expect index lists containing l, u to place them together. But do we sum over a leftmost or rightmost $k$ index? Comparing two inline expressions in Sec. 6.2.2 answers that. The $t$th item in $\mathbf{X}$, with indexing starting at $1$ instead of $0$, is denoted $\mathbf{x}^t$, and later we see $h_{l,\,u}^t$. It seems we want to place l, u last, as per your first guess and my answer.

In the notation $x_{l,\,u}^{(j)}$, this is the usual Western down-and-right reading order. To relate this to the phrase "input feature $j$ of unit $u$ in layer $l$", I can only recommend watching the prepositions: in layer $l$ there is a unit $u$ which has a feature $j$, so l, u, j is a more natural ordering than your alternative suggestion of l, j, u, but for efficient dot-product calculations it's been changed by a cyclic permutation to j, l, u.

More formally express the axioms of hyperbolic geometry

When you speak about writing this “more formally”, I get the impression you want predicate logic.

Your example of Line(A, B) → True is not terribly compelling: A predicate that will always be true does not add value to the system. Let's start at the beginning, and focus on your section titles. The first major title is axioms of incidence. So what is incidence? It's a point lying on a line, so you probably want to express this in terms of at least these three predicates:

Point(P) is a predicate indicating that P is a point.
Line(l) likewise a predicate indicating that l is a line.
Incidence(P, l) is a binary predicate, indicating that point P lies on line l.

Depending on your formal system, you might prefer to use set notation, e.g. write P ∈ Points or some such. Or you might want to add some type (or sort) assumptions:

¬ ∃x (Point(x) ∧ Line(x)): No x is both point and line at the same time.
∀P ∀l (Incidence(P, l) → (Point(P) ∧ Line(l))): If there is an incidence, then the first argument is a point and the second is a line.

Now you can start expressing your axioms.

Two distinct points always determine a line.

Better formulation: For every pair of distinct points, there exists a line incident with both of them.

∀A ∀B ((Point(A) ∧ Point(B) ∧ (A ≠ B)) → ∃l (Incidence(A, l) ∧ Incidence(B, l)))

Any two distinct points of a line determine this line uniquely.

If two distinct points are incident with two lines, those two lines must in fact be one and the same. Concluding identity is a standard way for expressing uniqueness in such systems.

∀A ∀B ∀g ∀h ((Incidence(A, g) ∧ Incidence(A, h) ∧ Incidence(B, g) ∧ Incidence(B, h) ∧ (A ≠ B)) → (g = h))

Every line has at least 2 points.

For every line there exist at least two distinct points incident with it.

∀l (Line(l) → ∃A ∃B (Incidence(A, l) ∧ Incidence(B, l) ∧ (A ≠ B))

There are at least 3 points not lying on the same line.

There are three points for which there exists no line incident with all three of them.

∃A ∃B ∃C (Point(A) ∧ Point(B) ∧ Point(C) ∧ ¬ ∃l (Incidence(A, l) ∧ Incidence(B, l) ∧ Incidence(C, l)))

For the second section you probably want to introduce a ternary predicate for order (or “between-ness”). Three points being ordered implies that they are incident with a single line, which helps tie this to the incidence axioms. So I'd go with

Order(A, B, C) is a ternary predicate indicating that B lies between A and C.
Order(A, B, C) → ∃l (Incidence(A, l) ∧ Incidence(B, l) ∧ Incidence(C, l)) is an axiom asserting that three ordered points are collinear.

Of any three points situated on a line, there is always one and only one which lies between the two others.

Picking this specific example, I'd recommend writing this out in fairly verbose text which you can then turn into formulas. Without any “exactly one” notation as part of typical predicate logic notation, what you do instead is to write down all the cases, each time noting that one of them holds but the others don't.

For any three distinct points A, B, C, if there exists a line incident with all of them, then (A is between B and C and B is not between A and C and C is not between A and B) or (B is between A and C and ...) or (...).

∀A ∀B ∀C (((A ≠ B) ∧ (A ≠ C) ∧ (B ≠ C) ∧
           ∃l (Incidence(A, l) ∧ Incidence(B, l) ∧ Incidence(C, l)))
          → (( Order(B, A, C) ∧ ¬Order(A, B, C) ∧ ¬Order(A, C, B)) ∨
             (¬Order(B, A, C) ∧  Order(A, B, C) ∧ ¬Order(A, C, B)) ∨
             (¬Order(B, A, C) ∧ ¬Order(A, B, C) ∧  Order(A, C, B))))

The third section introduces a lot of new terms: congruence of distances, congruence of angles, incidence between a point and a segment, half-plane, ray. At least the first two make sense to introduce as predicates of your logic. The others you might avoid by using the existing terminology, for example point incident with segment is the same as point lying between the endpoints of the segment. You might still introduce some of them as abbreviations.

Note that the axioms you quoted are common to both Euclidean and hyperbolic geometry. The differences would only come later, in the part that you omitted. So you can probably look at existing formalizations of Euclidean geometry (which should be more common) for inspiration, and then tweak them once you reach the point where things become different.

Best Answer

Related Solutions

Array Indexing Notation Trouble

Response to a comment, which is too long to be a comment

More formally express the axioms of hyperbolic geometry

Related Question