Array Indexing Notation Trouble

notation

Currently I am reading "The Hundred-Page Machine Learning Book" and ran into some notation issues. The book states the following:

A variable can have two or more indices, like this: $x^{(j)}_i$ or
like this $x^{(k)}_{i,j}$. For example, in neural networks, we denote
as $x^{(j)}_{l,u}$ the input feature j of unit u in layer l.

Though, I am having issues understanding this index notation. I can understand $x^{(j)}$ where x is the array and j is the single index and if x = [0, 2] then $x^{(0)}$ (where j = 0) returns the value 0 and $x^{(1)}$ returns 2.

To illustrate my understanding of $x^{(j)}$, I can write it in code:

int[] array = {0, 2};
System.out.println(array[0]);

It is easy to see how $x^{(j)}$ works. However, the multiple indices trip me up ($x^{(j)}_i$ and $x^{(j)}_{l,u}$ for example). I'm not sure which index is of the first, second, or third (etc.) degree.

int[][] array = {{0, 2}, {1, 3}};
System.out.println(array[0][1]);

This is an example of $x^{(j)}_i$. Though, I'm not sure if the first index is j or i or which is which.

Using the notation $x^{(j)}_i$, is this correct?

System.out.println(array[i][j]);

Or, is this correct?

System.out.println(array[j][i]);

And what about when there are 3 or more indices (the question above would still apply)? Then what?

I suppose I don't know the name of this notation either — the book didn't specify. I'm relatively new to notation, so please forgive me if this is a simple solution or complicated & convoluted question. To summarize, what is the name of this notation? And how do I use this notation properly? How do I know which letter maps to which index?

Some thoughts:
Using my previous knowledge of machine learning I understand that the array, $x^{(j)}_{l,u}$, should be layer first, feature second, unit third (something like array[layer][feature][unit]). Though the notation places these indices in random spots — it appears random to me. Though, it is crucial to know the order. Without this previous knowledge it would be impossible. If I encounter the notation in the future, which is inevitable, I will have no clue what to do.

Best Answer

You'd find out how this works by running your code, but let's run through the rules anyway.

array[a][b] means (array[a])[b], i.e. array lists the individual array[a]s. So depending on the value of $a$, array[a] is either $\{0,\,2\}$, $\{1,\,3\}$ or undefined, whence e.g. array[1][0] means {1, 3}[0], i.e. $1$.

My guess is that any implementation of $x_i^{(j)}$ you'll use from a library, or are expected to write with this book's guidance, treats $x$ as a list of the $x^{(j)}$s, so you'd need x[j][i]. But check with an example when you get there. Similarly, $x_{i,\,j}^{(k)}$ would be x[k][i][j].

Having said all that, arrays may not be the right approach anyway, if you want very efficient calculations. I'm not an expert on the Java implications (but see here), so I'll talk about more general issues.

In practice, machine learning often relies on a data type other than standard arrays, so we can do calculations faster. The language used for machine learning may therefore be determined by the availability of suitable types. Python is slower than Java ceteris paribus due to being an interpreted language, but is popular in machine learning because of "numpy arrays", which are the basis of scipy, scikit, tensorflow etc. Not that you need Python to take advantage of such techniques: Java has equivalents.

If you ever make use of such software, there are indexing complications. You'll be allowed to rewrite x[j][i] as x[j, i] and x[k][i][j] as x[k][i, j], and I expect you'd be allowed to use x[k, i, j] too. But more importantly, the most efficient way to do operations such as matrix multiplication wouldn't be the usual sum-over-a-loop syntax.

Response to a comment, which is too long to be a comment

As for the case $x_{l,\,u}^{(j)}$, See Chapter 6 for enlightenment on that.

In Sec 6.1's inline equation $a_{l,\,u}=\mathbf{w}_{l,\,u}\mathbf{z}+b_{l,\,u}$, we can rewrite the first term as $\sum_kw_{l,\,u,\,k}z_k$ or $\sum_kw_{k,\,l,\,u}z_k$, so we expect index lists containing l, u to place them together. But do we sum over a leftmost or rightmost $k$ index? Comparing two inline expressions in Sec. 6.2.2 answers that. The $t$th item in $\mathbf{X}$, with indexing starting at $1$ instead of $0$, is denoted $\mathbf{x}^t$, and later we see $h_{l,\,u}^t$. It seems we want to place l, u last, as per your first guess and my answer.

In the notation $x_{l,\,u}^{(j)}$, this is the usual Western down-and-right reading order. To relate this to the phrase "input feature $j$ of unit $u$ in layer $l$", I can only recommend watching the prepositions: in layer $l$ there is a unit $u$ which has a feature $j$, so l, u, j is a more natural ordering than your alternative suggestion of l, j, u, but for efficient dot-product calculations it's been changed by a cyclic permutation to j, l, u.

Related Question