How can you constrain a neural network layer to simply be a n-dimensional rotation layer

linear algebraneural networks

I'm looking to constrain one layer of my neural network to specifically find the best rotation of its input in order to satisfy an objective. (My end goal, where $R$ is the rotation layer, is of the form $R^T ~f_{objective} \left(Rz \right)$ ).

I am looking to train this (+ other components) via gradient descent. If $z\in\mathbb{R}^2$, then I can just say
$R = \left[ \begin{matrix}
\cos \theta & -\sin \theta\\
\sin \theta & \cos \theta
\end{matrix} \right] $,
and have $\theta$ be a learnable parameter.

However, I am lost on how to actually set this up for an $d$-dimensional space (where $d$>10). I've tried looking at resources on how to make a $d$-dimensional rotation matrix and it gets heavy into Linear Algebra and is way over my head. It feels like this should be easier than it seems, so I feel like I'm overlooking something (like maybe $R$ should just be a usual linear layer without any non-linear activations).

Anyone have any ideas? I appreciate you, in advance : )

Best Answer

Suppose you have a set of $d$ nodes with linear activation and weights $b_{ij}$ for input $j$ to node $i$. If you can impose the constraints $\sum_j b_{ij}^2=1$ and $\sum_{j} b_{ij}b_{kj}=0$ for $i\neq k$ then the mapping from input to output is multiplying by an orthonormal matrix.

The orthonormal matrices form two connected sets: the rotations (determinant =1) and the rotations with reflection (determinant=-1). If you have a learning rate that isn't too high, your transformation won't be able to jump between these components, so if you start your weights off at rotation they'll stay a rotation.

This assumes you want rotations around the origin. To get rotations around some other point needs non-zero intercept (bias) terms chosen to move that point to the origin, rotate, then move it back.

Best Answer

Related Solutions

Solved – Multivariate normal posterior

Solved – the “expressive power” of the composition function in a Recursive Neural Tensor Network

Related Question