I just read these notes from a Standford course. It says "you would see that this classifier [Nearest Neighbor] only achieves 38.6% on CIFAR-10" I did my own implementation, but I only got 24.9% accuracy with L1 and 25.3% accuracy with L2 in the test set. This is the code:
class NearestNeighbor(object):
def __init__(self):
pass
def train(self, X, y):
self.X = X
self.y = y
def predict(self, X, l1=True):
num_pred = X.shape[0]
pred = np.zeros(num_pred, dtype=self.y.dtype)
for i in range(num_pred):
if(l1): # L1 distance
distances = np.sum(np.abs(self.X - X[i,:]), axis=1)
else: # L2 distance
distances = np.sqrt(np.sum(np.square(self.X - X[i,:]), axis=1))
min_index = np.argmin(distances)
pred[i] = self.y[min_index]
return pred;
This is how I'm reading the dataset:
def load_pickle(f):
with open(f, 'rb') as fo:
return pickle.load(fo, encoding='latin1')
def load_data(directory='./cifar-10-batches-py/'):
# Reading trainig batches
train_batches = []
for i in range(1,6):
train_batch_file = directory + 'data_batch_' + str(i)
train_batches.append(load_pickle(train_batch_file))
X_train = np.concatenate([batch['data'] for batch in train_batches], 0)
y_train = np.concatenate([batch['labels'] for batch in train_batches], 0)
# Reading test batch
test_batch_file = directory + 'test_batch'
test_batch = load_pickle(test_batch_file)
X_test = test_batch['data']
y_test = test_batch['labels']
return X_train, y_train, X_test, y_test
I don't know if I'm doing something wrong or not. If I ain't, is it possible to get 38.6% accuracy only with NN? Under which conditions?
Best Answer
I get 31% by just using b in range(1,2) cause I have low memory. You can get best result with this code using b in range(1,6):