Solved – Determining a variable’s contribution to the variation in another

variance

I have data that look like this, with a tonnage measure and density measure for each carriage in a bunch of trains.

Train Carriage_No Tonnes  Density
A     1           105.5   2.12
A     2           104.9   2.28
A     3           101.2   2.30
A     4           108.7   2.41
B     1           112.3   2.51
B     2           109.7   2.34

etc..

How do I determine density's contribution to the variation in tonnes? What is doing this called? Some R code would be most helpful!

Our goal is to fit more tonnes into each carriage, but the variation around the mean is such that we're overloading an unacceptable percentage of carriages. Therefore we need to tighten the variation to allow us to increase the mean (i.e. more total tonnes) without overloading.

There are a few things we can do to improve tonnage variability, but controlling density isn't one of them (reacting to changes in density may be possible). So I'd like to understand how much the density accounts for the variation in tonnes.

Best Answer

You should probably start by getting a general idea about the relationship between Tonnes and Density:

plot(Tonnes~Density)
lines(ksmooth(Density,Tonnes,bandwidth=0.5))

and playing with the bandwidth parameter to figure out the form of the mean function of Tonnes conditional on Density. If it is anything like linear, you should be fine with regression/ANOVA techniques - e.g. R-squared given by the summary(lm(Tonnes~Density)) is precisely the portion of the variance in Tonnes due to the variance in Density. Look up the details here: http://cran.r-project.org/doc/contrib/Faraway-PRA.pdf
If it is nonlinear, some transformation of variables or nonlinear regression modelling might be in order and it is hard to specify a general approach here so maybe you should come back with a plot.

Related Question