Solved – Plotting gam model output – not component smooth functions

data visualizationgeneralized-additive-modelr

I aim to model the temperature variations in two locations in America, example below:

set.seed(10)
RandData <- runif(8760*2)
America <- rep(c('NewYork','Miami'),each=8760)

Date = seq(from=as.POSIXct("1991-01-01 00:00"), 
           to=as.POSIXct("1991-12-31 23:00"), length=8760)

DatNew <- data.frame(Loc = America,
                    Doy = as.numeric(format(Date,format = "%j")),
                    Tod = as.numeric(format(Date,format = "%H")),
                    Temp = RandData)
require(mgcv)
mod1 <- gam(Temp ~ Loc + s(Doy) + s(Doy,by = Loc) +
  s(Tod) + s(Tod,by = Loc),data = DatNew)

plot(mod1,pages = 1, scale = 0)

Instead of having an output showing the component smooth functions that make up the gam I would like to plot the model output on the original x and y axis i.e. show the temperatures on the y axis. When modelling 1 location I would use something along the lines of:

pred <- data.frame(Doy = DatNew$Doy)
pred <- transform(pred, yhat = predict(mod1, newdata = pred))

However, I do not know how to achieve this if I have several locations i.e. the model depends on the location not solely on the day of year/time of day.

How can this be achieved?

Best Answer

@Kate when I run your code, I get the following error from the last line, which is the key to the solution:

Error in eval(expr, envir, enclos) : object 'Loc' not found In addition: Warning message: In predict.gam(mod1, newdata = pred) : not all required variables have been supplied in newdata!

If you want to predict from a model, any model not just gam(), you must provide all the variables in the dataframe to predict from (the argument newdata).

pred <- data.frame(Doy = 1:365,  Tod=median(DatNew$Tod),
    Loc=factor('NewYork',levels=levels(DatNew$Loc)))
predict(mod1, newdata = pred)

And this works giving a predicted value of temperature for each day of the year in New York, assuming that they are measured at the median time of day. I tweaked your Doy values to be simply from 1 to 365, as presumably you only want one prediction for each day of the year. The value is almost completely flat because you simulated the data with no effect of Doy on temp, and it is almost exactly the expected value for a uniform(0,1) distribution, which is how you generated your random temperatures.