Solved – Using Lime on a binary classification neural network

keraslimeneural networks

I would like to use Lime to interpret a neural network model.
For the sake of this question, I made a simple Dense model using this dataset:

https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.data.csv

To make this dataset similar to the one I'm using, I added a header row to the .cvs file, and cut the labels (y) and pasted them into a new .cvs file.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn import preprocessing

from keras.models import Sequential
from keras.layers import Dense, Dropout
from keras import regularizers

import lime.lime_tabular

x = pd.read_csv("pima-indians-diabetes.csv")
y = pd.read_csv("pima-indians-diabetes_label.csv")
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.1, random_state=0)
min_max_scaler = preprocessing.MinMaxScaler()
x_train = min_max_scaler.fit_transform(x_train)
x_test = min_max_scaler.fit_transform(x_test)

model = Sequential()
model.add(Dense(16, activation='relu', input_shape=(8,), kernel_regularizer=regularizers.l2(0.01)))
model.add(Dropout(0.2))
model.add(Dense(64, activation='relu', kernel_regularizer=regularizers.l2(0.01)))
model.add(Dropout(0.2))
model.add(Dense(64, activation='relu', kernel_regularizer=regularizers.l2(0.01)))
model.add(Dropout(0.2))
model.add(Dense(128, activation='relu', kernel_regularizer=regularizers.l2(0.01)))
model.add(Dropout(0.2))
model.add(Dense(1, activation='sigmoid'))
print(model.summary())
model.compile(optimizer='rmsprop',
              loss='binary_crossentropy',
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=50, validation_data=(x_test, y_test), batch_size=32)

print(x)
print(y)

explainer = lime.lime_tabular.LimeTabularExplainer(x, feature_names=list(x), class_names=[0, 1], mode='classification')
exp = explainer.explain_instance(x_test[0], model.predict, num_features=8)
exp.show_in_notebook(show_table=True, show_all=False)

This is the two printed pandas dataframes as well as the error I'm getting:

     preg  pl_gl  bl_pr  tr_sk  ins   bmi   dpf  age
0       6    148     72     35    0  33.6   627   50
1       1     85     66     29    0  26.6   351   31
2       8    183     64      0    0  23.3   672   32
3       1     89     66     23   94  28.1   167   21
4       0    137     40     35  168  43.1  2288   33
..    ...    ...    ...    ...  ...   ...   ...  ...
763    10    101     76     48  180  32.9   171   63
764     2    122     70     27    0  36.8   340   27
765     5    121     72     23  112  26.2   245   30
766     1    126     60      0    0  30.1   349   47
767     1     93     70     31    0  30.4   315   23

[768 rows x 8 columns]
     label
0        1
1        0
2        1
3        0
4        1
..     ...
763      0
764      0
765      0
766      1
767      0

[768 rows x 1 columns]
Traceback (most recent call last):
  File "/home/Liz/src/programming/predicting_quality.py", line 346, in <module>
    explainer = lime.lime_tabular.LimeTabularExplainer(x, feature_names=list(x), class_names=[0, 1], mode='classification')
  File "/home/Liz/src/programming/nova/lib/python3.6/site-packages/lime/lime_tabular.py", line 218, in __init__
    random_state=self.random_state)
  File "/home/Liz/src/programming/nova/lib/python3.6/site-packages/lime/discretize.py", line 180, in __init__
    random_state=random_state)
  File "/home/Liz/src/programming/nova/lib/python3.6/site-packages/lime/discretize.py", line 51, in __init__
    bins = self.bins(data, labels)
  File "/home/Liz/src/programming/nova/lib/python3.6/site-packages/lime/discretize.py", line 185, in bins
    qts = np.array(np.percentile(data[:, feature], [25, 50, 75]))
  File "/home/Liz/src/programming/nova/lib/python3.6/site-packages/pandas/core/frame.py", line 2995, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/home/Liz/src/programming/nova/lib/python3.6/site-packages/pandas/core/indexes/base.py", line 2897, in get_loc
    return self._engine.get_loc(key)
  File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 109, in pandas._libs.index.IndexEngine.get_loc
TypeError: '(slice(None, None, None), 0)' is an invalid key

I could only find examples of decision forests for this type of binary classification using Lime[1], or neural networks that use image classification[2].

[1] https://github.com/marcotcr/lime/blob/master/doc/notebooks/Tutorial%20-%20continuous%20and%20categorical%20features.ipynb

[2] https://medium.com/applied-data-science/a-case-for-interpretable-data-science-using-lime-to-reduce-bias-e44f48a95f75

Is it possible to use Lime with this type of neural network?
If so, what mistake did I make (I suppose it would be in last three lines)?

Best Answer

I found the error, for anyone having the same problem, I had to change this to get it to work:

# changed x to x_train
explainer = lime.lime_tabular.LimeTabularExplainer(x_train, feature_names=list(x), class_names=[0, 1], mode='classification')
# added top_labels=1
exp = explainer.explain_instance(x_test[2], model.predict, num_features=8, top_labels=1)
Related Question