Solved – How to interpret the coefficients in the BSTS package in R

bayesianbstsrstate-space-modelstime series

The documentation for BSTS says the following about coefficients
If object contains a regression component then the output contains matrix with rows corresponding to coefficients, and columns corresponding to:
The posterior probability the variable is included.
The posterior probability that the variable is positive.
The conditional expectation of the coefficient, given inclusion.
The conditional standard deviation of the coefficient, given inclusion.
A there are several examples online of researchers using BSTS and showing a plot similar to the one below> I am not sure how to interpret the Y-axis. I read through the documentation, but it is still not clear to me. Does anyone use this package?

Best Answer

If I understand what you are plotting correctly the y-axis corresponds to the expected value of the coefficient's posterior, for the regression component of the model.

On the model$coefficients matrix the columns correspond to the predictors, and rows to the MCMC steps.

Q1: What is the connection between PC time series and "maximum variance"?

The data that they are analyzing are $\hat t$ data points for each of the $n$ neurons, so one can think about that as $\hat t$ data points in the $n$-dimensional space $\mathbb R^n$. It is "a cloud of points", so performing PCA amounts to finding directions of maximal variance, as you are well aware. I prefer to call these directions (which are eigenvectors of the covariance matrix) "principal axes", and the projections of the data onto these directions "principal components".

When analyzing time series, the only addition to this picture is that the points are meaningfully ordered, or numbered (from $1$ to $\hat t$), as opposed to being simply an unordered collection of points. Which means that if we take firing rate of one single neuron (which is one coordinate in the $\mathbb R^n$), then its values can be plotted as a function of time. Similarly, if we take one PC (which is a projection from $\mathbb R^n$ on some line), then it also has $\hat t$ values and can be plotted as a function of time. So if original features are time series, then PCs are also time series.

I agree with @Nestor's interpretation above: each original feature can be then seen as a linear combination of PCs, and as PCs are uncorrelated between each other, one can think of them as basis functions that the original features are decomposed into. It's a little bit like Fourier analysis, but instead of taking fixed basis of sines and cosines, we are finding the "most appropriate" basis for this particular dataset, in a sense that first PC accounts for most variance, etc.

"Accounting for most variance" here means that if you only take one basis function (time series) and try to approximate all your features with it, then the first PC will do the best job. So the basic intuition here is that the first PC is a basis function time series that fits all the available time series the best, etc.

Why is this passage in Freeman et al. so confusing?

Freeman et al. analyze the data matrix $\hat{\mathbf Y}$ with variables (i.e. neurons) in rows (!), not in columns. Note that they subtract row means, which makes sense as variables are usually centred prior to PCA. Then they perform SVD: $$\hat {\mathbf Y} = \mathbf{USV}^\top.$$ Using the terminology I advocate above, columns of $\mathbf U$ are principal axes (directions in $\mathbb R^n$) and columns of $\mathbf{SV}$ are principal components (time series of length $\hat t$).

The sentence that you quoted from Freeman et al. is quite confusing indeed:

The principal components (the columns of $\mathbf V$) are vectors of length $\hat t$, and the scores (the columns of $\mathbf U$) are vectors of length $n$ (number of voxels), describing the projection of each voxel on the direction given by the corresponding component, forming projections on the volume, i.e. whole-brain maps.

First, columns of $\mathbf V$ are not PCs, but PCs scaled to unit norm. Second, columns of $\mathbf U$ are NOT scores, because "scores" usually means PCs. Third, "direction given by the corresponding component" is a cryptic notion. I think that they flip the picture here and suggest to think about $n$ points in $\hat t$-dimensional space, so that now each neuron is a data point (and not a variable). Conceptually it sounds like a huge change, but mathematically it makes almost no difference, with the only change being that principal axes and [unit-norm] principal components change places. In this case, my PCs from above ($\hat t$-long time series) will become principal axes, i.e. directions, and $\mathbf U$ can be thought as normalized projections on these directions (normalized scores?).

I find this very confusing and so I suggest to ignore their choice of words, but only look at the formulas. From this point on I will keep using the terms as I like them, not how Freeman et al. use them.

Q2: What are the state space trajectories?

They take single-trial data and project it onto the first two principal axes, i.e. the first two columns of $\mathbf U$). If you did it with the original data $\hat{\mathbf Y}$, you would get two first principal components back. Again, projection on one principal axis is one principal component, i.e. a $\hat t$-long time series.

If you do it with some single-trial data $\mathbf Y$, you again get two $\hat t$-long time series. In the movie, each single line corresponds to such projection: x-coordinate evolves according to PC1 and y-coordinate according to PC2. This is what is called "state space": PC1 plotted against PC2. Time goes by as the dot moves around.

Each line in the movie is obtained with a different single trial $\mathbf Y$.

Best Answer

Related Solutions

Solved – R intercept in arima with xreg

Solved – How to interpret PCA on time-series data

Q1: What is the connection between PC time series and "maximum variance"?

Why is this passage in Freeman et al. so confusing?

Q2: What are the state space trajectories?

Related Question