I'll see if I can't help with the following...
I am really lost at rearranging the matrices in the form given in their examples
Here's some tips for going between matrices and Python list
s and numpy
array
s
$$
v =
\begin{pmatrix}
9 \\\
8 \\\
7
\end{pmatrix}
$$
Where I usually see $v_{\left(1\right)} = 9$ and $v_{\left(3\right)} = 7$ is written just a bit like...
vList = [9, 8, 7]
vArray = np.array(vList)
# vList[0] # -> 9
# vArray[2] # -> 7
... and in most programming languages the index starts at 0
, but it looks like you've got that.
Things get interesting with nested lists and multidimensional numpy arrays...
$$
p =
\begin{pmatrix}
0.3 & 0.6 & 0.9 \\\
0.4 & 0.7 & 0.8 \\\
0.5 & 0.8 & 0.7
\end{pmatrix}
$$
Which could be represented as a numpy.array
as shown...
p = np.array([
[0.3, 0.6, 0.9],
[0.4, 0.7, 0.8],
[0.5, 0.8, 0.7]
])
Accessing rows could then look like...
p[0]
# -> array([ 0.3, 0.6, 0.9])
p[2]
# -> array([0.5, 0.8, 0.7])
... but accessing cells is likely to frustrate those that want precision at a cellular level...
p[2,0]
# -> 0.5
# ... above looks okay...
p[0,0]
# -> 0.29999999999999999
# ... but that was `0.3`...
p[1,1]
# -> 0.69999999999999996
# ... and that should have been `0.7`
Even funkier than that...
p * 5
# -> array([[ 1.5, 3. , 4.5],
# [ 2. , 3.5, 4. ],
# [ 2.5, 4. , 3.5]])
Hopefully this gave ya some traction on translating your problem into something that a computer will consider, as well as some pointers on how not to use numpy
. It's fantastic but misusing it can lead to anger, and anger can lead to; well let's not even consider the paths divergent from light ;-)
You are correct, mathematically speaking adding a $1\times 3$ vector to $3\times 1$ vector does not make much sense in terms of vector spaces, but it can be useful when doing calculations.
Matlab is doing something called "automatic broadcasting" which it adopted a few versions ago from Octave. Here you can read more about it:
https://blogs.mathworks.com/loren/2016/10/24/matlab-arithmetic-expands-in-r2016b/
What happens with broadcasting is the lacking dimensions are replicated:
$$[a,b,c] + \begin{bmatrix}x\\y\\z\end{bmatrix} \to \begin{bmatrix}a&b&c\\a&b&c\\a&b&c\end{bmatrix} + \begin{bmatrix}x&x&x\\y&y&y\\z&z&z\end{bmatrix}$$
So the row of the first vector gets copied as many times as needed to fit the 3 rows of the second vector and vice versa.
Edit: Mathematically speaking, what actually happens is that instead of calculating $${\bf v}^T+\bf w$$
The software finds $N,M\in \mathbb Z$ so that the following expression makes sense:
$$({\bf v}^T \otimes {\bf 1}_N) + ({{\bf 1}_M}^T\otimes \bf w)$$
where $\otimes$ is a Kronecker product and then calculates it.
Best Answer
This is a case of less conventional notation used in deep learning: We allow the addition of a matrix and a vector, yielding another matrix: $C = A+b$, where $C_{i,j}$ is $A_{i,j} + b_j$. See: http://www.deeplearningbook.org/contents/linear_algebra.html (page 32)