Solved – How to select the number of basis functions and smoothing parameter in functional data analysis

functional-data-analysisr

I am confused over the selection of number of basis and smoothing parameter for particular data set in functional data analysis. I have two data sets

(A) Daily temperature data set of England of 228 years 1780-2007 (here)
(B) Global temperature anomalies 1850-2012 (here)

I want to convert the data in to functional data with optimal number of basis and selection of smoothing parameter

my first approach for global data is

library(fda.usc)
df <- read.csv("global_1850_2012.csv")
df <- df[ ,-c(1)] ### first column is of years
n <- 163 ## number of years
g <- 12 ## monthly
t <- seq(0,1,length=g)
dataf <- fdata(df,argvals=t,rangeval=c(0,1))    ## create functions
test <-   min.basis(dataf, lambda = (seq(-1,1, by =0.01)), numbasis = 4:12)   

Now it gives me the
test$lambda.opt = 0.01 and test$numbasis.opt = 8

which matches well with Buddhananda Banerjee (2015), "On existence of a change in mean of functional data".

But when I applied same to the England temperature data

library(fda.usc)
df <- read.csv("england228.csv")
df <- df[ ,-c(1)] ### first column is of years
n <- 228 ## number of years
g <- 365 ## daily
t <- seq(0,1,length=g)
dataf <- fdata(df,argvals=t,rangeval=c(0,1))    ## create functions
test <-   min.basis(dataf, lambda = (seq(-1,1, by =0.01)), numbasis = 5:300) 

Then a long computation generates an entirely different results and suggests a large number of basis where as in 12-basis are used for the same data.(Detecting changes in the mean of functional observations, Berkes (2009)).

A simple criteria is also mentioned for minimizing least square, which also produces different results. In a nutshell, my question is what is the appropriate way to select the number of basis and smoothing parameter? I am a beginner in Functional Data Analysis, kindly bear with my trivial question.

Best Answer

in order to set the number of basis you should follow the criterion

nbasis = length(time.points) + norder - 2

while to set an optimized smoothing parameter I usually apply the cross validation criterion CGV.