Here is some code that is hopefully self-explanatory:
set.seed(20987) # for reproducability
N = 200
# variables
days_since = rpois(N, lambda=60)
site = factor(sample(c("site1", "site2"), N, replace=T), c("site1", "site2"))
age = factor(sample(c("juv", "adult"), N, replace=T), c("juv", "adult"))
year = factor(sample(c("2012", "2013"), N, replace=T), c("2012", "2013"))
PC1 = rnorm(N, mean=100, sd=25)
arrival_date = sample.int(365, N, replace=T)
# betas
B0 = 13
Bds = 74
Bs = 114
Ba = 160
By = 191
Bpc = 59
Bad = 11
# response variable
weight = B0 + Bds*days_since + Bs*(site=="site2") + Ba*(age=="adult") +
By*(year=="2013") + Bpc*PC1 + Bad*arrival_date + rnorm(N, mean=0, sd=10)
model = lm(weight~days_since+site+age+year+PC1+arrival_date)
# predicted values for plot
ds = seq(min(days_since), max(days_since))
ds1j2 = predict(model, data.frame(days_since=ds, site="site1", age="juv",
year="2012", PC1=mean(PC1), arrival_date=mean(arrival_date)))
ds1j3 = predict(model, data.frame(days_since=ds, site="site1", age="juv",
year="2013", PC1=mean(PC1), arrival_date=mean(arrival_date)))
ds1a2 = predict(model, data.frame(days_since=ds, site="site1", age="adult",
year="2012", PC1=mean(PC1), arrival_date=mean(arrival_date)))
ds1a3 = predict(model, data.frame(days_since=ds, site="site1", age="adult",
year="2013", PC1=mean(PC1), arrival_date=mean(arrival_date)))
ds2j2 = predict(model, data.frame(days_since=ds, site="site2", age="juv",
year="2012", PC1=mean(PC1), arrival_date=mean(arrival_date)))
ds2j3 = predict(model, data.frame(days_since=ds, site="site2", age="juv",
year="2013", PC1=mean(PC1), arrival_date=mean(arrival_date)))
ds2a2 = predict(model, data.frame(days_since=ds, site="site2", age="adult",
year="2012", PC1=mean(PC1), arrival_date=mean(arrival_date)))
ds2a3 = predict(model, data.frame(days_since=ds, site="site2", age="adult",
year="2013", PC1=mean(PC1), arrival_date=mean(arrival_date)))
# plot
windows()
plot(x=ds, y=ds1j2, ylim=c(11000, 14500), type="l", lty=1,
ylab="predicted weight", xlab="days since 1st Sept")
points(range(ds), range(ds1j2), pch=5)
lines(x=ds, y=ds1j3, lty=1); points(range(ds), range(ds1j3), pch=18)
lines(x=ds, y=ds1a2, lty=2); points(range(ds), range(ds1a2), pch=5)
lines(x=ds, y=ds1a3, lty=2); points(range(ds), range(ds1a3), pch=18)
lines(x=ds, y=ds2j2, lty=1); points(range(ds), range(ds2j2), pch=1)
lines(x=ds, y=ds2j3, lty=1); points(range(ds), range(ds2j3), pch=16)
lines(x=ds, y=ds2a2, lty=2); points(range(ds), range(ds2a2), pch=1)
lines(x=ds, y=ds2a3, lty=2); points(range(ds), range(ds2a3), pch=16)
legend("bottomright", lty=rep(1:2, 4), pch=c(5,18,5,18,1,16,1,16),
legend=c("site 1, juveniles, 2012", "site 1, juveniles, 2013",
"site 1, adults, 2012", "site 1, adults, 2013",
"site 2, juveniles, 2012", "site 2, juveniles, 2013",
"site 2, adults, 2012", "site 2, adults, 2013"))
You can write code that's much shorter by writing functions that will read in a list and do all of this for you rather than copying and pasting the same thing eight times in a row, but this should be easier to follow. Here is the plot:
This kind of plot is more interesting / useful when there are interactions (the lines aren't parallel). In this case, we just have a set of eight lines that are shifted vertically relative to each other.
Best Answer
The different ranges of your data is no problem, since e.g. scaling (both or just one of them) to mean zero and unit variance does not change the correlation between them. However, if you want to correlate the two data vectors you have, the need to have the same length, i.e. if your PMDI vector has more data points than your other vector, then you need to find a way (e.g. taking the mean over some period) to summarise your PMDI vector in less data points. Calculating correlation in Python: See e.g. https://stackoverflow.com/questions/19428029/how-to-get-correlation-of-two-vectors-in-python