Solved – Confidence interval based on time series

confidence intervalindependencetime series

I have a timeseries of data which was gathered by driving a car around randomly.
One data point was gathered every minute and each data point is either "Yes" or "No". Yes = temperature is above a threshold, No = it is below the threshold. Assume the car never stops.

I'm interested in the proportion of Yeses and related confidence intervals.

I'm inclined to use binomial confidence intervals except that it seems to me that the data points are not independent.
Would the fact that the data was gathered by driving around randomly buy me independence? How could I derive meaningful confidence intervals?

Best Answer

First, it's almost impossible to drive a car "randomly." Did you periodically consult a random number generator to determine what direction to head in next? I don't think so. This calls into question the use of any statistical procedure that assumes randomness (even if it isn't simple and could lead to dependence). "Arbitrariness" and "randomness" are fundamentally different: it's important not to confuse the two.

Second, a confidence interval applies to inferences about a population or process. In this case it seems to have something to do with temperature, but temperature where? When? Unless this is clearly defined you can develop no valid information relating your data to a target of inference.

Third, independence is not absolutely necessary. This problem sounds similar to standard environmental and ecological sampling plans where something is observed along a transect: animals are counted from an airplane, water temperature is logged from a boat, and so on. When randomness is incorporated in the sampling plan ("design based"), you can use various probability sampling estimators and their confidence intervals. Otherwise, you need to use geostatistical methods ("model based"). These view the temperatures as samples of a spatially continuous field of temperatures. They attempt to deduce some characteristics of this field from the sampled temperatures, characteristics such as the type, extent, and direction of spatial correlation. From this various inferences can be made about temperatures in unsampled locations. Geostatistical methods exist even for these binary (thresholded) data, where they are known as "indicator Kriging." Such analyses intimately involve the relative geometries of the sample locations and the field to be estimated. Thus, in this case, an accurate record of the car's location at each sample is needed.

Related Question