I have seen the min-max normalization formula but that normalizes values between 0 and 1. How would I normalize my data between -1 and 1? I have both negative and positive values in my data matrix.
Solved – How to normalize data between -1 and 1
datasetnormalization
Related Solutions
Store the mean and standard deviation of the training dataset features. When the test data is received, normalize each feature by subtracting its corresponding training mean and dividing by the corresponding training standard deviation.
Normalizition by min/max is usually a very bad idea since it involves scaling your entire data according to two particular observations. This leads your scaling to be dominated by noise. mean/std is a standard procedure and you can even experiment with more robust measures (e.g. median/MAD)
Why scale/normalize? Because of the way the SVM optimization problem is defined, features with higher variance have greater effect on the margin. Usually this doesn't make sense - we'd like our classifier to be 'unit invariant' (e.g. a classifier that combines patients' weight and height shouldn't be affected by the choice of units - kgs or grams, centimeters or meters).
However, I guess that there might be cases in which all of the features are given in the same units and the differences in their variance indeed reflect differences in importance. In such case I'd try to skip scaling/normalization and see what it does to the performance.
I know you've got your answer but I want to clarify something...
- This is a case of reversed scale min max normalization.
That means - best value is 21.07 and worst value is 100 (for your case).
Here you should use: $$ x_{normalized} = \frac{max(x)-x_i}{max(x)-min(x)} $$ Example:
If your normalizing $x_i = 99$ the result should be closer to 0. $$ x_{normalized} = \frac{100-x_i}{100-21.07}=\frac{100-99}{78.93}=0.013 $$ - In most cases the formula used for min max scaling is:
$$
x_{normalized} = \frac{x_i-min(x)}{max(x)-min(x)}
$$
Example:
If your normalizing $x_i = 99$ the result should be closer to 1.
Big values produce big results. $$ x_{normalized} = \frac{x_i-21.07}{100-21.07}=\frac{99-21.07}{78.93}=0.987 $$
In both cases:
max(x) - represents the maximum value of the entire population (mesurements)
min(x) - represents the minimum value found in the entire population (mesurements)
Sidenote - a more generalized approach
There is a common issue in MOGA Multi-Objective Genetic Algorithm optimization where the algorithm can minimize the objective function f(x,y). In here sometimes we want to switch from minimization to maximization of the objective function. We can do that in two ways:
Revese scale with unknown range. if you don't know the range of values. Just slap a (-1) multiplication: $$ f(x,y) = (-1)*f(x,y) $$ Example using our normalization formula: $$ (-1)*x_{normalized} = (-1)*0.987 = -0.987 $$ The values are negative (and harder for humans to interpret) but are in correct order of importance.
Big values $x_i = 99$ get more negative results (smaller) -0.987
Small values like $x_i = 30$ get more positive results (bigger) -0.113
Reversed scale just like you want.
Easy to interpret for computers: $$x_{normalized}(99)<x_{normalized}(30)$$ $$-0.987 < -0.113$$Revese scale with known range, if you know your range of function values is 0-1. $max_{range} = 1$ of our function (normalized range).
Example using our normalization formula: $$ max_{range}-x_{normalized} = 1-x_{normalized} = 1-0.987 = 0.013 $$ The advantage of this formula is that you keep the 0-1 range.
Cudos to whuber for providing the shortest answer with a useful formula. Hope my answer provides some light on why use this or that and how it works to next users faced with these problems.
All the best from Ro
Best Answer
With: $$ x' = \frac{x - \min{x}}{\max{x} - \min{x}} $$ you normalize your feature $x$ in $[0,1]$.
To normalize in $[-1,1]$ you can use:
$$ x'' = 2\frac{x - \min{x}}{\max{x} - \min{x}} - 1 $$
In general, you can always get a new variable $x'''$ in $[a,b]$:
$$ x''' = (b-a)\frac{x - \min{x}}{\max{x} - \min{x}} + a $$