I'm currently performing variable selection in a network. My procedure was a derivation from Forward Selection. I have two main questions on that:
1- I started off fitting a model for each of the available independent variables(IVs), 39 in total, and then I picked the IV with the highest R2. As next step, I fitted a model for all the 38 left using each one of them plus the one I picked in the previous step.
At first I was using R2 adjusted, but I got stuck in only variable with an awful performance. So I started using R2 based on the assumption that R2 either increases its value by addition of new IV or keep the same value(when the model sets the coefficients for the new IV to zero) and would stop the addition of new IV's when either the R2'd keep the previous value or wouldn't not increase by an arbitrary threshold.
My database is very small (120×30), so I had to use cross validation in order to guarantee some robustness to the future model. I made sure to use the same folds at every new try with a +1 predictor, so I discard variability from the training data.
The problem is that when I moved from 4 IVs to 5, the R2 decreased. What got me wondering if the statement that R2 always increases or stays the same by addition of new IVs hold true for Neural Networks as well.
2 – When I made the same procedure as above, keeping training data all others parameters untouched, but tracking the usefulness of new predictors with MSE, I got somewhat the same results, but the order of some predictors was changed. Neural Networks are non-deterministic proccess, but given that all the others variables were fixed, shouldn't I get the same results?
Many thanks in advance and sorry for the giant question
Additional info: Keras on Windows 10 running on jupyter notebook
Code used for both questions:
# for instantiating a model every new predictor
def simple_model(hidden_layer, input_size):
model = keras.Sequential([
layers.Dense(hidden_layer, activation=tf.nn.sigmoid, input_shape=(input_size,),
kernel_initializer = keras.initializers.RandomNormal(seed=0),
bias_initializer = keras.initializers.RandomNormal(seed=0)),
layers.Dense(1,
kernel_initializer = keras.initializers.RandomNormal(seed=0),
bias_initializer = keras.initializers.RandomNormal(seed=0))])
model.compile(loss='mean_squared_error',
optimizer='sgd',
metrics=['mean_absolute_error', 'mean_squared_error'])
return model
# code for tracking addition new IV by R2
# available is a dict of the form: predictor : random_string_to_be_replaced_by_the_model
log_r2_predictor = []
for preditor in predictors:
print("Instantiating model {}".format(preditor))
r2t, r2val = 0,0
for i in folds:
available[preditor] = simple_model(5,1)
available[preditor].fit(input.loc[i[0],preditor],output[i[0]], epochs = 2000, validation_split =0, verbose =0)
s_treino = available[preditor].predict(input.loc[i[0],preditor])
s_teste = available[preditor].predict(input.loc[i[1],preditor])
r2t += r2_score(output[i[0]], s_treino)
r2val += r2_score(output[i[1]], s_teste)
log_r2_predictor.append([preditor, r2t/5, r2val/5])
When tracking with MSE, I simply changed the r2t and r2val to mae_treino,mse_treino, mae_val and mse_val and the last 5 lines to:
_, mae_treino, mse_treino = available[preditor].evaluate(input.loc[i[0],preditor], output.loc[i[0]])
mae_treino += mae_treino
mse_treino += mse_treino
_, mae_val, mse_val = available[preditor].evaluate(input .loc[i[1],preditor], output.loc[i[1]])
mae_val += mae_val
mse_val += mse_val
log_mae_predictor.append([preditor, mae_treino/5, mse_treino/5, mae_val/5, mse_val/5])
Best Answer