I want to fit a non ordered multinomial model to data. In it, the dependent variable has 3 categories (success, partial success, fail) and i have a binary predictor and two other continuous predictors (raw measures for length and speed). I have obtained thousands of observations from a hunting videogame in which the players may kill the prey, hurt it or fail, and the prey varies in speed and the size of vital body parts. However, for me it would be most helpful to represent predictors using ratios (length/ total length, or speed/maximum speed) rather than using raw data. I thought that using proportions is not good practice for linear regressions, but I could not find any hint for a not ordered multinomial model as mine.
The only thing i need to know is whether or not i can use proportions instead of natural numbers as predictors in a non ordered multinomial model and why.
Here is a reproducible example:
Data <- data.frame(
X = sample(1:100),
D = sample(1:100),
Y = sample(c("yes", "no"), 10, replace = TRUE),
Z=sample(c("body", "tail", "fail"), 10, replace = TRUE))
require(nnet)
test=multinom(Z~Y+X+D+X:Y+D:X+D:Y,data=Data)
summary(test)
z=summary(test)$coefficients/summary(test)$standard.errors;z# t values
relativize=function(x){return(x/max(x))}
Data$X=relativize(Data$X)
Data$D=relativize(Data$D)
test1=multinom(Z~Y+X+D+X:Y+D:X+D:Y,data=Data)
z1=summary(test1)$coefficients/summary(test1)$standard.errors;z1# t values
z# t values for unscaled
z1# t values for scaled data
(1 - pnorm(abs(z), 0, 1)) * 2# z test p values
(1 - pnorm(abs(z1), 0, 1)) * 2# z test p values
AIC(test, test1)
confint(test)
confint(test1)
As you can see, I get nearly identical AIC values, but the effects and significance of the terms in the model change hugely! That behavior is not derived from my specific dataset, rather can be reproduced with any other. Which model is the correct one?
In the original dataset, when using raw variables i get several intervals that do not pass through zero (according to the significant significant terms in the model). In some of them, the intervals are very narrow and get very very close to zero. The most extreme example is this interval: (0.001, 7.16E-03). When using scaled data, all those narrow and close to zero intervals are considered to be non significant (i.e. they pass through zero) . The question keeps being: Which one is correct? I got tempted to think that when coefficients were so close to zero, the change in outcome odds generated by the associated terms probably have minor biological importance. However, I am unsure if that sensation actually comes from the units used for my model's terms (i.e. number of pixels).
Best Answer
Even after the additional information in comments (and now included in the EDIT) the question is not very clear. At first I read it as if by "proportions" you ment the count response vector expressed as percentages, but now I see that you write "and two other continuous predictors (raw measures for length and speed)." However, for me it would be most helpful to use proportions (length/ total length, or speed/maximum speed) rather than using raw data. I know that using proportions is not good practice for linear regressions, but ..." so you maybe really ask if you can express length, speed as the ratios (better word here than proportions) of length (speed) relative to the maximum observed in the data. If that reading is correct, there should be no problem, as predictors. You said you heard that so is not a good practice in linear regression, I think that must be a misunderstanding. Using ratios as response variables is often a bad idea, but there is not a problem in using them as predictors.
Moreover, I read you as dividing say $\text{length}$ by $\text{length}_\text{max}$ where the maximum is taken over the complete sample, that is effectively only a linear transformation of predictors, and will only change the model by multiplying the corresponding coefficient by a constant (both for linear and multinomial regression) and is certainly not a problem. The problems with using a ratio as a response variable is when both numerator and denominator variaes over the sample. Please comment if I did understand you correctly now, and please edit your OP to clarify.
This is to answer what you say in last comment. But first a look at the above R code. Note that:
Note that all the rows are identical, you have effective only made independent samples of size 10 and then repeated it ten times. You should really explain why. That gives data far removed from the independence assumtion behind the multinomial model you are fitting. Also, for the variables
X
andD
you are only permuting randomly1:100
, not sampling from it. Why?Then, you fit the multinomial model with the multinom function from nnet package. That uses a neural net algorithm, with random starts, which do not (in principle) guarantee identical solution when called multiple times. But just for the multinomial likelihood it seems to work well, but there are numerical issues, note from the help page:
So the difference between your two model fits, before and after scaling (which are small), are entirely due to numerical problems. Since predictor variables between 0 and 1 are preferred, you should probably trust more the fit after scaling.
There are newer and maybe better implementations of multinomial regression in R today, see https://cran.r-project.org/web/packages/mnlogit/vignettes/mnlogit.pdf
EDIT
(partial) answer to question in comments, after adding confint to the code. (Note that confint.multinom uses the standard t-based confidence intervals, with variances obtained by a call to vcov on the model object, it does not use likelihood profiling). See my code below, but first note that all your confidence intervals contains zero so you should be very careful with interpreting a model where no coefficients are significant!
To locate where are the differences, do
Observe the noticeable differences are from the line with
D
on. But, even when the differences are largest, both confidence intervals contain zero, so the interpretation is the same. You should be careful with interpreting nonsignificant coefficients! The only thing to note is that maybe these asymptotic confidence intervals are imprecise, to check one could try to construct likelihood profiling confidence intervals also in this case. That is for later (or for you ...)