Solved – How to perform multiple logistic regression for a continuous dependent variable with values bounded between 0 and 1

logisticrregression

I'd like to model the response of three species functional groups (proportion of total abundance) to different environmental gradients. I thought a multiple linear regression could work well, but now I heard that I should use multiple logistic regression, because my response variable (proportion of total counts) is bounded between 0 and 1. However, when I try to perform logistic regression in R with my data, I have the following error message:

In eval(family$initialize): non-integer #successes in a binomial glm!

More specifically, here is my very simple code:

test_clusters <- read.table(clusters.txt)

head(test_clusters, 3)# checking data

    Cluster1   Cluster2  Cluster3  PC1_soil  PC2_soil precip  disturb
P2 0.8297214 0.01857585 0.1517028  2.200434 0.5114511    647 51.98126
P4 0.3196347 0.04109589 0.6392694 -1.016489 1.9255986    591 16.47774
P7 0.7352941 0.03361344 0.2310924  2.479751 0.6501704    516 20.30064

## test_clusters[,1:3] are the proportional abundance of each cluster, while [,4:7] are the predictor (environmental) variables

## Trying to perform multiple logistic regression to test the response of each cluster to the environmental gradients

model <- glm (Cluster1 ~ PC1_soil + PC2_soil + precip + disturb,
              data = test_clusters, family = binomial(link="logit"))

Then I have the error message commented above:

In eval(family$initialize): non-integer #successes in a binomial glm!

Someone know what's the problem? Any other suggestion about the more appropriate test for this kind of data would valuable.

Best Answer

You get the warning (not error) because you did not use the weight argument to glm with the binomial family and a 1 dimensional outcome variable that is in the $(0,1)$ range.

Do you know the total population for each Cluster1 fraction? If so, use this as the weight argument. Though, I may have misunderstood your problem.