Q1: What is the connection between PC time series and "maximum variance"?
The data that they are analyzing are $\hat t$ data points for each of the $n$ neurons, so one can think about that as $\hat t$ data points in the $n$-dimensional space $\mathbb R^n$. It is "a cloud of points", so performing PCA amounts to finding directions of maximal variance, as you are well aware. I prefer to call these directions (which are eigenvectors of the covariance matrix) "principal axes", and the projections of the data onto these directions "principal components".
When analyzing time series, the only addition to this picture is that the points are meaningfully ordered, or numbered (from $1$ to $\hat t$), as opposed to being simply an unordered collection of points. Which means that if we take firing rate of one single neuron (which is one coordinate in the $\mathbb R^n$), then its values can be plotted as a function of time. Similarly, if we take one PC (which is a projection from $\mathbb R^n$ on some line), then it also has $\hat t$ values and can be plotted as a function of time. So if original features are time series, then PCs are also time series.
I agree with @Nestor's interpretation above: each original feature can be then seen as a linear combination of PCs, and as PCs are uncorrelated between each other, one can think of them as basis functions that the original features are decomposed into. It's a little bit like Fourier analysis, but instead of taking fixed basis of sines and cosines, we are finding the "most appropriate" basis for this particular dataset, in a sense that first PC accounts for most variance, etc.
"Accounting for most variance" here means that if you only take one basis function (time series) and try to approximate all your features with it, then the first PC will do the best job. So the basic intuition here is that the first PC is a basis function time series that fits all the available time series the best, etc.
Why is this passage in Freeman et al. so confusing?
Freeman et al. analyze the data matrix $\hat{\mathbf Y}$ with variables (i.e. neurons) in rows (!), not in columns. Note that they subtract row means, which makes sense as variables are usually centred prior to PCA. Then they perform SVD: $$\hat {\mathbf Y} = \mathbf{USV}^\top.$$ Using the terminology I advocate above, columns of $\mathbf U$ are principal axes (directions in $\mathbb R^n$) and columns of $\mathbf{SV}$ are principal components (time series of length $\hat t$).
The sentence that you quoted from Freeman et al. is quite confusing indeed:
The principal components (the columns of $\mathbf V$) are vectors of length $\hat t$, and the scores (the columns of $\mathbf U$) are vectors of length $n$ (number of voxels), describing the projection of each voxel on the direction given by the corresponding component, forming projections on the volume, i.e. whole-brain maps.
First, columns of $\mathbf V$ are not PCs, but PCs scaled to unit norm. Second, columns of $\mathbf U$ are NOT scores, because "scores" usually means PCs. Third, "direction given by the corresponding component" is a cryptic notion. I think that they flip the picture here and suggest to think about $n$ points in $\hat t$-dimensional space, so that now each neuron is a data point (and not a variable). Conceptually it sounds like a huge change, but mathematically it makes almost no difference, with the only change being that principal axes and [unit-norm] principal components change places. In this case, my PCs from above ($\hat t$-long time series) will become principal axes, i.e. directions, and $\mathbf U$ can be thought as normalized projections on these directions (normalized scores?).
I find this very confusing and so I suggest to ignore their choice of words, but only look at the formulas. From this point on I will keep using the terms as I like them, not how Freeman et al. use them.
Q2: What are the state space trajectories?
They take single-trial data and project it onto the first two principal axes, i.e. the first two columns of $\mathbf U$). If you did it with the original data $\hat{\mathbf Y}$, you would get two first principal components back. Again, projection on one principal axis is one principal component, i.e. a $\hat t$-long time series.
If you do it with some single-trial data $\mathbf Y$, you again get two $\hat t$-long time series. In the movie, each single line corresponds to such projection: x-coordinate evolves according to PC1 and y-coordinate according to PC2. This is what is called "state space": PC1 plotted against PC2. Time goes by as the dot moves around.
Each line in the movie is obtained with a different single trial $\mathbf Y$.
Best Answer
It isn't meaningful to run PCA on a univariate time series (or, more generally, a single vector). To run PCA on time series data, you'd need to have either a multivariate time series, or multiple univariate time series. There are ways to transform a univariate time series into a multivariate one (e.g. wavelet or time-frequency transforms, time delay embeddings, etc.). For example, the spectrogram of a univariate time series gives you the power at each frequency, for each moment in time.
Say we have a multivariate time series with $p$ dimensions/variables. Or, we might have a set of $p$ univariate time series, where each time point has some common meaning across time series (e.g. time relative to some event). In both cases, there are $n$ time points. There are a couple ways to run PCA:
Consider each time point to be an observation. Dimensions correspond to variables of the multivariate time series, or to the different univariate time series. So, there are $n$ points in a $p$ dimensional space. In this case, eigenvectors correspond to instantaneous patterns across the dimensions/time series. At each moment in time, we represent the amplitude across dimensions/time series as a linear combination of these patterns.
Consider each variable of the multivariate time series (or each univariate time series) to be an observation. Dimensions correspond to time points. So, there are $p$ points in an $n$-dimensional space. In this case, the eigenvectors correspond to temporal basis functions, and we're representing each time series as a linear combination of these basis functions.
Given the above, it's apparent why PCA doesn't make sense for a single univariate time series. Either you have $n$ observations and 1 dimension (in which case there's nothing for PCA to do), or you have a single observation with $n$ dimensions (in which case the problem is completely underdetermined and all solutions are equivalent).