Question #1:
The problem is that in the MLE case, both the Python (statsmodels) and R procedures use state-space models to estimate the likelihood. In an SARIMAX class, the state-space grows linearly (or worse) with the number of seasons (because the state-space form incorporates all intermediate lags too - so if you have a lag at 3600, the state-space form also has all the 3599 intermediate lags).
So you now have a couple of issues - first, you're multiplying 3600+ matrices by each other, which is slow. Even worse, state space models need to be initialized and often they are by default initialized using a stationary initialization that requires solving a 3600 linear system. When I tested a 3600 seasonal order, it wasn't even getting past this part.
The R arima function accepts method='CSS' which uses least-squares (conditional MLE instead of full MLE) to solve the problem. Depending on how the arima function works, it could be much better in your case.
In Python, there aren't many good options. The SARIMAX class accepts a conserve_memory
option, but if you do that, you can't forecast. To solve the initialization problem, you can call the initialize_approximate_diffuse
method to avoid the 3600 linear system solving. However, even in these cases, you'll be multiplying 3600 x 3600 matrices together, which will be quite slow. I would like to update the SARIMAX class to work with sparse matrices (which would solve this problem) but that's probably quite a ways in the future. I don't know of any non-commercial program that implements state space models using sparse matrices.
Question #5:
This was a bug in the statsmodels code. It has been fixed in the repository (see https://github.com/ChadFulton/statsmodels/issues/2)
Best Answer
The use of linear returns (percentage change) and log returns are both used in financial applications. Two arguments for using log returns for time series modelling are