This is some further advise/discussion I was given:
AIC RIW can only be calculated from a balanced candidate model set. If you have 3 variables (e.g. repro, time & WR) then the balanced set (without interactions) is
repro
time
WR
repro + time
repro + WR
time + WR
repro + time + WR
intercept only
the number of models in the set is 2 to the power of the number of explanatory variables (in this case = 8)
with 2-way interactions your candidate model set ALSO includes the following (i.e. in addition to those above)
repro + time + repro*time
repro + WR + repro*WR
time + WR + time*WR
repro + time + WR + repro*time
repro + time + WR + repro*WR
repro + time + WR + time*WR
If you want the 3-way interaction, then you would ALSO add this to all of the models described above.
Each variable relative importance weight is then the SUM of ALL AIC-weights from models that contain that variable. Because AIC-weights are standardized to sum to one within a candidate model set, then RIW for each variable can range from 0 to 1.
Do not divide the result by the number of models it is contained in – it is the total sum. I would only use these for balanced candidate model sets; I wouldn’t use RIW for a smaller number of models.
NOTE that if you include interactions, then you can only compare the RIWs of main effects with each other, and you can only compare the RIWs of interactions with each other. You cannot compare main effect RIWs with interaction RIWs (because main effects are present in more models than interactions).
FYI: a strong explanatory variable will have a RIW of around 0.9, moderate effects of around 0.6-0.9, very weak effects of around 0.5-0.6 and below that, forget about it. For interactions, a strong effect could be >0.7, moderate >0.5.
If you’re not using RIWs then simply look at your model table and see if you get consistent improvements in AIC when you add specific variables, and by how much. Strong effects will often give you improvements in AIC of >5, moderate 2-5 and weak 0-2. If you don’t get an improvement at all, then it isn’t explaining anything.
if you don’t have a balanced candidate set, but DO have the AIC weights (which it appears you do), then you can simply use the ratios of these to determine the strength of support for one model over another. E.g. if you have model 1 with AIC-weight of 0.7 and model 2 with an AIC-weight of 0.15; then model 1 has 4.6 times more support from the data than model 2 (0.7/0.15). You can use this to assess the relative strength of variables as they go in and out of models. But you don’t NEED to do these calculations – and can simply refer the reader to the table. Especially if you have a dominant model; or a series of models at the top that all contain a particular variable. Then it is simply obvious to everyone that it is important.
The two models treat initial values differently. For example, after differencing, an ARIMA model is computed on fewer observations, whereas an ETS model is always computed on the full set of data. Even when the models are equivalent (e.g., an ARIMA(0,1,1) and an ETS(A,N,N)), the AIC values will be different.
Effectively, the likelihood of an ETS model is conditional on the initial state vector, whereas the likelihood of a non-stationary ARIMA model is conditional on the first few observations, even when a diffuse prior is used for the nonstationary components.
Best Answer
You can basically only compare AIC scores when you use the same dataset. Within the same dataset you can't even compare AIC scores if let's say one model is fitted on 70 records and the other model on 69 records (because of a missing value in one of the variables).
Can't you combine all the datasets into one single large dataset? Then you can compare your different models without any problems while using AIC scores.