Solved – Using TensorFlow Neural Network with Sklearn’s Adaboost

boostingclassificationensemble learningscikit learntensorflow

I'm kind of new to ensemble methods, but since bagging, pasting and voting gave me some improvement in accuracy of my feed forward neural network (implemented in Tensorflow with multiple hidden layers) I thought I'd give boosting a try.

Since I already used Sklearn's BaggingClassifier and VotingClassifier, I thought for boosting the easiest way would be to use the AdaboostClassifier of Sklearn. My Tensorflow model already inherits from Sklearn's BaseEstimator and implements predict_proba(X) function.

The fit function of my NN takes the argument sample_weights which is passed by AdaboostClassifier of Sklearn while training. But here is my problem: I have no clue how to actually use the sample_weights to give more weight on samples with higher weights.

I found this question, but the guy in the answer didn't explain the 2nd step, which is my problem. The only thing I can think of is initializing the weights in some other way or multiplying the weight of each sample with the calculated cost for this sample maybe. But I don't know if that makes any sense at all.

I'm grateful for every hint, that puts me in the right directions. Thanks!

EDIT
Since I've been asked for, I'll provide some snippets of the code, which might help to understand my problem better.

snippet of pipeline class that calls the boosting classifier:

mlp = MLP(n_classes=4, batch_size=200, hm_epochs=1, keep_prob_const=1.0, optimizer='adam',
                               learning_rate=0.001, step_decay_LR=True, weight_init='sqrt_n', bias_init=0.01,
                               hidden_layers=(10, 10, 10), activation_function='relu6')
clf = AdaBoostClassifier(mlp, n_estimators=3, algorithm="SAMME.R", learning_rate=0.5)

snippets of the multilayer perceptron, which I simplified quite a bit, so there might be some mistakes in it (partly taken from a tutorial of the page https://pythonprogramming.net):

from sklearn.base import BaseEstimator
class MLP(BaseEstimator):


def __init__(self, n_classes=4, batch_size=200, hm_epochs=15, keep_prob_const=1.0,
             optimizer='adam', learning_rate=0.001, step_decay_LR=False, bias_init=0.0,
             weight_init='xavier', hidden_layers=(600, 600, 600), activation_function='relu'):
     # init arguments...

def fit(self, X_train, y_train, sample_weight=None):
    self.sample_weight = sample_weight # <= how am I gonna use this?

    with self.graph.as_default():
        self.prediction = self.neural_network_model(X_train) # hidden layers with ReLu, output softmax, init weight and biases
        cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=self.prediction, labels=self.y))
        optimizer = tf.train.AdamOptimizer(learning_rate=self.learning_rate_tensor).minimize(cost)

    with tf.Session(graph=self.graph) as sess:
        sess.run(tf.initialize_all_variables())

         for epoch in range(self.hm_epochs):
            while i < len(X_train):
                start = i
                end = i + self.batch_size
                batch_x = np.array(X_train[start:end])
                batch_y = np.array(y_train_conv[start:end])

                _, c = sess.run([optimizer, cost], feed_dict={self.x: batch_x,
                                                              self.y: batch_y,
                                                              self.keep_prob: self.keep_prob_const
                                                              })
                epoch_loss += c
                i += self.batch_size

    return self

def predict(self, X):
    with tf.Session(graph=self.graph) as sess:
        return sess.run(tf.argmax(self.prediction, 1), feed_dict={self.x: X_test, self.keep_prob: self.keep_prob_const})

def predict_proba(self, X):
    with tf.Session(graph=self.graph) as sess:
        return sess.run(self.prediction, feed_dict={self.x: X, self.keep_prob: self.keep_prob_const})

Best Answer

Ok, so if anyone is interested in this:

I used the experiments of Schwenk and Bengio and implemented what they call version R for my sample weighting:

def resample_with_replacement(self, X_train, y_train, sample_weight):

    # normalize sample_weights if not already
    sample_weight = sample_weight / sample_weight.sum(dtype=np.float64)

    X_train_resampled = np.zeros((len(X_train), len(X_train[0])), dtype=np.float32)
    y_train_resampled = np.zeros((len(y_train)), dtype=np.int)
    for i in range(len(X_train)):
        # draw a number from 0 to len(X_train)-1
        draw = np.random.choice(np.arange(len(X_train)), p=sample_weight)

        # place the X and y at the drawn number into the resampled X and y
        X_train_resampled[i] = X_train[draw]
        y_train_resampled[i] = y_train[draw]

    return X_train_resampled, y_train_resampled


def fit(self, X_train, y_train, sample_weight=None):
    if sample_weight != None:
        X_train, y_train = self.resample_with_replacement(X_train, y_train, sample_weight)
    ...