The difficulty here is non-uniqueness.
Consider the two shear matrices (I'm going to use $2 \times 2$ to make typing easier; the translation part's easy to deal with in general, and then we just have the upper-left $2 \times 2$ anyhow):
$$
A = \begin{bmatrix}
1 & 1 \\
0 & 1
\end{bmatrix},
B = \begin{bmatrix}
1 & 0 \\
-0.5 & 1
\end{bmatrix}
$$
Their product is
$$
AB = \begin{bmatrix}
0.5 & 1 \\
-0.5 & 1
\end{bmatrix}
$$
That's exactly the same thing as scaling by $\sqrt{2}$ in $x$ and by $\frac{\sqrt{2}}{2}$ in $y$, and then rotating by 45 degrees.
So if I gave you the matrix
$$
\begin{bmatrix}
0.5 & 1 \\
-0.5 & 1
\end{bmatrix},
$$
which answer would you want? The two shears, or the scale and rotation?
Why does this happen? Because you have five free parameters (rotation, 2 scales, 2 shears) and a four-dimensional set of matrices (all possible $2 \times 2$ matrices in the upper-left corner of your transformation). A continuous map from the first onto the second will necessarily be many-to-one.
In short: I think you need to ask a different question.
One possibility is to say "scale has to be the same in $x$ and $y$", or, perhaps worded better, "we only allow uniform scaling". Then there's generally a unique solution, although there are some bad cases: rotation-by-180 degrees and scaling-by-negative-one yield the same matrix, for instance.
I am not an expert and have just starting thinking about this myself. I am intrigued by how many different ways there are to think about transforms / degrees of freedom.
I think the simplest way to see that an Affine transform has 6 degrees of freedom is that there are 6 variables in the matrix:
$$
\begin{bmatrix}
m_{00} & m_{01} & m_{02} \\
m_{10} & m_{11} & m_{12} \\
0 & 0 & 1 \\
\end{bmatrix}
$$
No matter what value we choose for any of those variables, it is a valid Affine transform. Although the Similarity transform can also be represented by a 6 variable multiplication matrix, it is more constrained - if we picked 4 of the variables at random, the other 2 we would have to choose carefully in order that it is a valid Similarity transform. So it has less degrees of freedom even though it still can be written as a matrix with 6 variables. Similarly, we can use an Affine transform to describe a simple translation, as long as we set the four left numbers to be the identity matrix, and only change the two translation variables.
The purest mathematical idea of an Affine transform is these 6 numbers and the way you multiply them with a vector to get a new vector. What this transform actually does can be described in a variety of ways - as 6 operations that you are doing one after the other (translate x, translate y, scale x, scale y, rotate, shear), or one thing you are doing all at once. If you think of them in terms of these operations, you might be confused by this matrix:
$$
\begin{bmatrix}
-1 & 0 & 0 \\
0 & -1 & 0 \\
0 & 0 & 1 \\
\end{bmatrix}
$$
This matrix can be thought of as either a rotation by 180 degrees about the origin, or of scaling x by -1 and y by -1, or by reflecting x and y through the origin. All of the transformations are equivalent, and this is the only matrix that describes them.
Another way we could think about degrees of freedom is with how many fingers you would need to describe this transform by dragging points.
A translation I can describe with one finger - by dragging a single point to its new location. Open Google Maps on your phone and try it. Each finger counts for two degrees of freedom since you can move it horizontally, and vertically.
A euclidean transform has 3 DOF - you need one finger to translate the shape, then the second finger you can use to rotate it, but this finger only has one degree of freedom. This one is better illustrated not in Google maps, but with a credit card on a desk - one finger moves the card, the other rotates it, but the second finger is less free since it always has to follow the first finger around somewhat. Moving the second finger arbitrarily would try to stretch the card, which is impossible. So, the first finger has 2 DOF, the second finger has one more.
A similarity transform has four degrees of freedom - Google Maps works for this one again. Drag two fingers on your phone on Google Maps at the same time. No matter where you drag your two fingers, the app is able to find a similarity transform for you - one that keeps the map the same shape, but translates, rotates, and scales it.
You would need to drag three fingers to do an Affine transform - Google Maps doesn't support this, since it would skew the map, so you wouldn't be able to navigate it using it anymore - but you can kind of pretend using a hankerchief, two of the fingers can translate and rotate it (pretend they can scale it too) and then the third finger can skew it this way and that. Almost any drag
And, dragging four fingers would let you do a 2D homogenous translation. You can try this in a photo editing program called Gimp. It's under tools > transform tools > perspective, and it lets you drag four different points around - so it counts for 8 degrees of freedom.
Note that not every possible position of the 4 points is necessarily a valid transform, but it still counts as 8 degrees of freedom since the points can still freely move in 2 dimensions - there's just certain values they can't take.
Who knows, I hope this helps!
Best Answer
What do you mean by "improper rotations"? An orthogonal transformation with a reflection (i.e. determinant -1)? This would appear to be a non-uniqueness problem and could be fixed by flipping the sign of an arbitrary (but same index) column of $U$ and $V$, which would still be a valid SVD.