First up, the requirements for compression and analysis/presentation are not necessarily the same -- indeed, for analysis you might want to keep all the raw data and have the ability to slice and dice it in various ways. And what works best for you will depend very much on what you want to get out of it. But there are a number of standard tricks that you could try:
- Use differences rather than raw data
- Use thresholding to remove low-level noise. (Combine with differencing to ignore small changes.)
- Use variance over some time window rather than average, to capture activity level rather than movement
- Change the time base from fixed intervals to variable length runs and accumulate into a single data point sequences of changes for which some criterion holds (eg, differences in same direction, up to some threshold)
- Transform data from real values to ordinal (eg low, medium, high); you could also do this on time bins rather than individual samples -- eg, activity level for each 5 minute stretch
- Use an appropriate convolution kernel* to smooth more subtly than your moving average or pick out features of interest such as sharp changes.
- Use an FFT library to calculate a power spectrum
The last may be a bit expensive for your purposes, but would probably give you some very useful presentation options, in terms of "sleep rhythms" and such. (I know next to nothing about Android, but it's conceivable that some/many/all handsets might have built in DSP hardware that you can take advantage of.)
* Given how central convolution is to digital signal processing, it's surprisingly difficult to find an accessible intro online. Or at least in 3 minutes of googling. Suggestions welcome!
Based upon the comments that you've offered to the responses, you need to be aware of spurious causation. Any variable with a time trend is going to be correlated with another variable that also has a time trend. For example, my weight from birth to age 27 is going to be highly correlated with your weight from birth to age 27. Obviously, my weight isn't caused by your weight. If it was, I'd ask that you go to the gym more frequently, please.
As you are familiar with cross-section data, I'll give you an omitted variables explanation. Let my weight be $x_t$ and your weight be $y_t$, where
$$\begin{align*}x_t &= \alpha_0 + \alpha_1 t + \epsilon_t \text{ and} \\ y_t &= \beta_0 + \beta_1 t + \eta_t.\end{align*}$$
Then the regression
$$\begin{equation*}y_t = \gamma_0 + \gamma_1 x_t + \nu_t\end{equation*}$$
has an omitted variable---the time trend---that is correlated with the included variable, $x_t$. Hence, the coefficient $\gamma_1$ will be biased (in this case, it will be positive, as our weights grow over time).
When you are performing time series analysis, you need to be sure that your variables are stationary or you'll get these spurious causation results. An exception would be integrated series, but I'd refer you to time series texts to hear more about that.
Best Answer
To start with I will list 7 forecasting methods in time series.
The first 4 methods try to make the rough edges of time series data smooth so as to correctly forecast the data.
In the above image blue color shows the trend with true time series data while the red color shows the smoothed series.
The last 3 methods try to break down time series data into its various component such as
trend
,season
,cycle
andresidual
(remnant)In the above image, I show what to expect in the decomposition of time series data