You'd find out how this works by running your code, but let's run through the rules anyway.
array[a][b]
means (array[a])[b]
, i.e. array
lists the individual array[a]
s. So depending on the value of $a$, array[a]
is either $\{0,\,2\}$, $\{1,\,3\}$ or undefined, whence e.g. array[1][0]
means {1, 3}[0]
, i.e. $1$.
My guess is that any implementation of $x_i^{(j)}$ you'll use from a library, or are expected to write with this book's guidance, treats $x$ as a list of the $x^{(j)}$s, so you'd need x[j][i]
. But check with an example when you get there. Similarly, $x_{i,\,j}^{(k)}$ would be x[k][i][j]
.
Having said all that, arrays may not be the right approach anyway, if you want very efficient calculations. I'm not an expert on the Java implications (but see here), so I'll talk about more general issues.
In practice, machine learning often relies on a data type other than standard arrays, so we can do calculations faster. The language used for machine learning may therefore be determined by the availability of suitable types. Python is slower than Java ceteris paribus due to being an interpreted language, but is popular in machine learning because of "numpy arrays", which are the basis of scipy, scikit, tensorflow etc. Not that you need Python to take advantage of such techniques: Java has equivalents.
If you ever make use of such software, there are indexing complications. You'll be allowed to rewrite x[j][i]
as x[j, i]
and x[k][i][j]
as x[k][i, j]
, and I expect you'd be allowed to use x[k, i, j]
too. But more importantly, the most efficient way to do operations such as matrix multiplication wouldn't be the usual sum-over-a-loop syntax.
Response to a comment, which is too long to be a comment
As for the case $x_{l,\,u}^{(j)}$, See Chapter 6 for enlightenment on that.
In Sec 6.1's inline equation $a_{l,\,u}=\mathbf{w}_{l,\,u}\mathbf{z}+b_{l,\,u}$, we can rewrite the first term as $\sum_kw_{l,\,u,\,k}z_k$ or $\sum_kw_{k,\,l,\,u}z_k$, so we expect index lists containing l, u
to place them together. But do we sum over a leftmost or rightmost $k$ index? Comparing two inline expressions in Sec. 6.2.2 answers that. The $t$th item in $\mathbf{X}$, with indexing starting at $1$ instead of $0$, is denoted $\mathbf{x}^t$, and later we see $h_{l,\,u}^t$. It seems we want to place l, u
last, as per your first guess and my answer.
In the notation $x_{l,\,u}^{(j)}$, this is the usual Western down-and-right reading order. To relate this to the phrase "input feature $j$ of unit $u$ in layer $l$", I can only recommend watching the prepositions: in layer $l$ there is a unit $u$ which has a feature $j$, so l, u, j
is a more natural ordering than your alternative suggestion of l, j, u
, but for efficient dot-product calculations it's been changed by a cyclic permutation to j, l, u
.
When you speak about writing this “more formally”, I get the impression you want predicate logic.
Your example of Line(A, B) → True
is not terribly compelling: A predicate that will always be true does not add value to the system. Let's start at the beginning, and focus on your section titles. The first major title is axioms of incidence. So what is incidence? It's a point lying on a line, so you probably want to express this in terms of at least these three predicates:
Point(P)
is a predicate indicating that P
is a point.
Line(l)
likewise a predicate indicating that l
is a line.
Incidence(P, l)
is a binary predicate, indicating that point P
lies on line l
.
Depending on your formal system, you might prefer to use set notation, e.g. write P ∈ Points
or some such. Or you might want to add some type (or sort) assumptions:
¬ ∃x (Point(x) ∧ Line(x))
: No x
is both point and line at the same time.
∀P ∀l (Incidence(P, l) → (Point(P) ∧ Line(l)))
: If there is an incidence, then the first argument is a point and the second is a line.
Now you can start expressing your axioms.
- Two distinct points always determine a line.
Better formulation: For every pair of distinct points, there exists a line incident with both of them.
∀A ∀B ((Point(A) ∧ Point(B) ∧ (A ≠ B)) → ∃l (Incidence(A, l) ∧ Incidence(B, l)))
- Any two distinct points of a line determine this line uniquely.
If two distinct points are incident with two lines, those two lines must in fact be one and the same. Concluding identity is a standard way for expressing uniqueness in such systems.
∀A ∀B ∀g ∀h ((Incidence(A, g) ∧ Incidence(A, h) ∧ Incidence(B, g) ∧ Incidence(B, h) ∧ (A ≠ B)) → (g = h))
- Every line has at least 2 points.
For every line there exist at least two distinct points incident with it.
∀l (Line(l) → ∃A ∃B (Incidence(A, l) ∧ Incidence(B, l) ∧ (A ≠ B))
There are at least 3 points not lying on the same line.
There are three points for which there exists no line incident with all three of them.
∃A ∃B ∃C (Point(A) ∧ Point(B) ∧ Point(C) ∧ ¬ ∃l (Incidence(A, l) ∧ Incidence(B, l) ∧ Incidence(C, l)))
For the second section you probably want to introduce a ternary predicate for order (or “between-ness”). Three points being ordered implies that they are incident with a single line, which helps tie this to the incidence axioms. So I'd go with
Order(A, B, C)
is a ternary predicate indicating that B
lies between A
and C
.
Order(A, B, C) → ∃l (Incidence(A, l) ∧ Incidence(B, l) ∧ Incidence(C, l))
is an axiom asserting that three ordered points are collinear.
Of any three points situated on a line, there is always one and only one which lies between the two others.
Picking this specific example, I'd recommend writing this out in fairly verbose text which you can then turn into formulas. Without any “exactly one” notation as part of typical predicate logic notation, what you do instead is to write down all the cases, each time noting that one of them holds but the others don't.
For any three distinct points A, B, C, if there exists a line incident with all of them, then (A is between B and C and B is not between A and C and C is not between A and B) or (B is between A and C and ...) or (...).
∀A ∀B ∀C (((A ≠ B) ∧ (A ≠ C) ∧ (B ≠ C) ∧
∃l (Incidence(A, l) ∧ Incidence(B, l) ∧ Incidence(C, l)))
→ (( Order(B, A, C) ∧ ¬Order(A, B, C) ∧ ¬Order(A, C, B)) ∨
(¬Order(B, A, C) ∧ Order(A, B, C) ∧ ¬Order(A, C, B)) ∨
(¬Order(B, A, C) ∧ ¬Order(A, B, C) ∧ Order(A, C, B))))
The third section introduces a lot of new terms: congruence of distances, congruence of angles, incidence between a point and a segment, half-plane, ray. At least the first two make sense to introduce as predicates of your logic. The others you might avoid by using the existing terminology, for example point incident with segment is the same as point lying between the endpoints of the segment. You might still introduce some of them as abbreviations.
Note that the axioms you quoted are common to both Euclidean and hyperbolic geometry. The differences would only come later, in the part that you omitted. So you can probably look at existing formalizations of Euclidean geometry (which should be more common) for inspiration, and then tweak them once you reach the point where things become different.
Best Answer
A few thoughts on each. I'm not a professional mathematician, just a learner who did a mathematically-heavy subject at university, but these are the main considerations that come to my mind:
Prime marks:
Subscripts:
Letters:
There are probably also considerations about ease of typesetting if the material is for publication—I've no experience of that, but the standard reference (in the UK) on copy-editing includes a long section on editing mathematics.