If the observations of a stochastic process are irregularly spaced the most natural way to model the observations is as discrete time observations from a continuous time process.
What is generally needed of a model specification is the joint distribution of the observations $X_{1}, \ldots, X_n$ observed at times $t_1 < t_2 < \ldots < t_n$, and this can, for instance, be broken down into conditional distributions of $X_{i}$ given $X_{i-1}, \ldots, X_1$. If the process is a Markov process this conditional distribution depends on $X_{i-1}$ $-$ not on $X_{i-2}, \ldots, X_1$ $-$ and it depends on $t_i$ and $t_{i-1}$. If the process is time-homogeneous the dependence on the time points is only through their difference $t_i - t_{i-1}$.
We see from this that if we have equidistant observations (with $t_i - t_{i-1} = 1$, say) from a time-homogeneous Markov process we only need to specify a single conditional probability distribution, $P^1$, to specify a model. Otherwise we need to specify a whole collection $P^{t_{i}-t_{i-1}}$ of conditional probability distributions indexed by the time differences of the observations to specify a model. The latter is, in fact, most easily done by specifying a family $P^t$ of continuous time conditional probability distributions.
A common way to obtain a continuous time model specification is through a stochastic differential equation (SDE)
$$dX_t = a(X_t) dt + b(X_t) dB_t.$$
A good place to get started with doing statistics for SDE models is Simulation and Inference for Stochastic Differential Equations by Stefano Iacus. It might be that many methods and results are described for equidistant observations, but this is typically just convenient for the presentation and not essential for the application. One main obstacle is that the SDE-specification rarely allows for an explicit likelihood when you have discrete observations, but there are well developed estimation equation alternatives.
If you want to get beyond Markov processes the stochastic volatility models are like (G)ARCH models attempts to model a heterogeneous variance (volatility). One can also consider delay equations like
$$dX_t = \int_0^t a(s)(X_t-X_s) ds + \sigma dB_t$$
that are continuous time analogs of AR$(p)$-processes.
I think it is fair to say that the common practice when dealing with observations at irregular time points is to build a continuous time stochastic model.
The answer will depend on your study design (e.g., cross-sectional time series? cohort time series, serial cohorts time series?). Honaker and King have developed an approach that is useful for cross-sectional time series (possibly useful for serial cohorts time series, depending on your assumptions), including the R package Amelia II for imputing such data. Meanwhile Spratt &Co. have described a different approach that can be used in some cohort time series designs, but is sparse on software implementations.
A cross-sectional time series design (aka panel study design) is one in which a population(s) is (are) repeatedly sampled (e.g., every year), using the same study protocol (e.g., same variables, instruments, etc.). If the sampling strategy is representative, these kinds of data produce an annual picture (one measurement per participant or subject) of the distributions of those variables for each population in the study.
A cohort time series design (aka repeated cohorts study design, longitudinal study design, also sometimes called a panel study design) is one in which individual units of analysis are sampled once and followed over a long period of time. The individuals may be sampled in a representative fashion from one or more populations. However, a representative cohort time series sample will become an increasingly poor representative of the target population (at least in human populations) as time passes, because of people being born or aging into the target population, and dying or aging out of it, along with immigration and emigration.
A serial cohorts time series design (aka repeated, multi-, and multiple cohorts, or panel study design) is one in which a population(s) is (are) repeatedly sampled (e.g., every year), using the same study protocol (e.g., same variables, instruments, etc.), which measures individual units of analysis within a population at two points of time during the period (e.g., during the year) in order to create measures of rate of change. If the sampling strategy is representative, these kinds of data produce an annual picture of the rates of change in those variables for each population in the study.
References
Honaker, J. and King, G. (2010). What to do about missing values in time-series cross-section data. American Journal of Political Science, 54(2):561–581.
Spratt, M., Carpenter, J., Sterne, J. A. C., Carlin, J. B., Heron, J., Henderson, J., and Tilling, K. (2010). Strategies for multiple imputation in longitudinal studies. American Journal of Epidemiology, 172(4):478–4876.
Best Answer
Strictly speaking, variance is a property of the distribution of your data points and all you can do is to estimate it using a variance estimator. The latter is normalised to the number of samples and thus independent on the window you apply it to – assuming that you are using the unbiased estimator and do not try to fill any gaps.
However, all of this is implicitly based on the assumption that each of your data points is an indepedent sample from the same distribution, which may not even be a good approximation for real data (in which case variance may not be a good measure anymore anyway). As a pathological example, suppose that your data points just linearly depend on the time. In this case, increasing the temporal width of the window increases the variance. The same holds whenever your data is temporally correlated.
Taking another point of view: If assuming that your data points are independent samples from the same distribution is actually appropriate, the time at which a data point was sampled does not matter for estimating that distribution’s variance and there is no difference between sampling equidistantly and at random points. However, this assumption often does not hold in real applications and a variance estimator may serve other purposes that estimating a distribution’s variance.
This problem becomes less severe if your gaps are short and essentially random in position. Or, from another point of view: If you have the number of data points go towards infinity and it’s random which data points are missing, there is no effect of gaps.
The estimator of the autocorrelation function is based on variance estimators and averages, which are both normalised to the sample size. Thus, if you calculate the means ignoring the missing points, there is no effect in the limit of infinite data points and if the missing data points are random.
However if a lot of data points are missing or if there is no rhythm in your sampling times, you will hardly ever find a pair of points for a given time lag and thus you cannot estimate the autocorrelation directly anymore.
Variance: Don’t fill the gaps. You may not have a problem in the first place, and if you have, filling the gaps won’t fix it. Do not increase the temporal size of your window, this may introduce a bias, if there is any correlation in your data.
Autocorrelation: If you have a few missing unbiased data points in otherwise evenly sampled data, the above applies. You can estimate the components of the autocorrelation estimator ignoring the missing points.
If you have a lot of missing data points or there is no even sampling in the first place, I would try to first obtain an estimate of the frequency spectrum using Lomb–Scargle periodograms and then estimate the autocorrelation function from this using the Wiener–Khinchin theorem. I am no expert on these methods, so there might be problems with this. I suggest to test this approach with artificial data first or find literature about it.
In neither case do I see a reason to fully ignore windows with missing data points.