Solved – Why are neural networks easily fooled

adversarial-examplemachine learningneural networks

I've read some papers about manually contriving images to "fool" a neural network (see below).

Is this because that the networks only model the conditional probability $p(y|x)$?
If a network can model the joint probability $p(y,x)$, will such cases still occur?

My guess is such artificially generated images are different from the training data, so they are of low probability $p(x)$. Hence $p(y,x)$ should be low even if $p(y|x)$ can be high for such images.

Update

I've tried some generative models, it turned out not being helpful, so I guess probably this is a consequence of MLE?

I mean in the case that KL divergence is used as the loss function, the value of $p_{\theta}(x)$ where $p_{data}(x)$ is small doesn't affect the loss. So for a contrived image that doesn't match $p_{data}$, the value of $p_{\theta}$ can be arbitrary.

Update

I found a blog by Andrej Karpathy that shows

These results are not specific to images, ConvNets, and they are also
not a “flaw” in Deep Learning.

enter image description here
EXPLAINING AND HARNESSING ADVERSARIAL EXAMPLES
enter image description here
Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images

Best Answer

The sort of models you are referring to are called 'generative' models as opposed to discriminatory, and do not really scale up to high dimensional data. Part of the successes of NN in language tasks is the move from a generative model (HMM) do a 'more' discriminatory model (eg MEMM uses logistic regression which allows contextual data to be used effectively https://en.wikipedia.org/wiki/Hidden_Markov_model#Extensions)

I would argue that the reason they are fooled is a more general problem. It is the current dominance of 'shallow' ML-driven AI over more sophisticated methods. [in many of the papers it is mentioned that other ML models are also easily fooled - http://www.kdnuggets.com/2015/07/deep-learning-adversarial-examples-misconceptions.html - Ian Goodfellow]

the most effective 'language model' for many tasks is 'bag of words'. No one would claim that this represents a meaningful model of human language. its not hard to imagine that these sort of models are also easily fooled.

similarly computer vision tasks such as object recognition were revolutionised by 'visual bag of words' which blew the more computationally intensive methods away (which couldn't be applied to massive data sets).

CNN are I would argue a better 'visual bag of words' - as you show in your images, the mistakes are made at the pixel level/low level features; despite all the hyperbole there is no high level representation in the hidden layers- (everyone makes mistakes, the point is that a person would make 'mistakes' due to higher level features and would eg recognise a cartoon of a cat, which I don't believe an NN would).

An example of a more sophisticated model of computer vision (which perform worse than NN) is eg 'deformable parts' model.

Related Question