Solved – Different number of time-points in functional data analysis

functional-data-analysis

I have a data set that I wish to analyze using functional data analysis methods. Data consists of repeated measures of some characteristic on a number of inviduals. I have the time of the measurements and these differ between individuals. So far, I see no problem in analyzing this, e.g. by using ready-to-use ${\tt R}$ packages. However, the number of measurements per individual differs as well for my data. How can I handle this? Can anyone point me in the direction of literature on this set-up or any packages that can handle it?

What I want to do with data to start with is basically something like functional linear model (functional response and scalar predictors).

I'm aware that I could smooth the data using a very light smoothing procedure (using many basis functions and no or a very small penalty parameter) and use these smoothed curves for my analysis. But is this the best way to approach the problem? It doesn't take into account that some curves are determined with more precision than are others. An additional problem to this approach is that some of the individuals only have measurements in a subinterval (say, [200, 1000]) of the full interval of the majority of the curves (say, [0, 1000]). Thus, the smooth would be very poorly estimated in those regions. Naturally, I could restrict attention to the interval [200, 1000] but this seems like a waste of data.

To sum up my question, what would you do in this case? Any thoughts and hints will be much appreciated.

Best Answer

The Matlab package PACE can deal with data with different observed time points (irregular data), which is exactly suitable for your analysis. In your example, as long as all the data pooled together are dense in [0, 1000], the FPCA function in the package can estimate an individual curve even if its measurements are on [200, 1000].

The method used by FPCA to deal with irregular data is Principal components analysis by Conditional Expectation (PACE, see Yao, Mueller, and Wang 2005). An individual curve can be represented by its functional principal components (FPC, Wikipedia has an entry) in the eigenbasis. The method aims to estimate the FPCs even if individual data are observed sparsely, i.e. with only a few observations. The estimated individual curves can then be recovered from the FPCs.

One choice for fitting a functional linear model with functional response and scalar predictors is to obtain the FPCs of the response first, and then reduce the problem to a linear regression model with multivariate response. You can consider using the FPCreg function in PACE, although it require a functional predictor (you can make a scalar predictor a constant function).

If you prefer manipulating the data in R while using Matlab for FPCA you can try the R package R.matlab, which let you pass around some R and Matlab objects.

Related Question