MATLAB: Speed optimization of partial inner product (norm)

4d partial multiplication inner product dot product norm speed

For a row vector, the norm can be written as
sqrt(sum(P.^2))
or
sqrt(P*P')
The latter is about twice as fast. Now I have a 4D matrix with dimensions [100,100,100,70], and would like to take the norm of the last dimension to yield a matrix of dimension [100,100,100]. This works:
sqrt(sum(P.^2,4))
but is too slow. Does anyone know a way to speed this up (perhaps in a similar way as the 1D case?)