I'll first try to share some intuition behind CNN and then comment the particular topics you listed.
The convolution and sub-sampling layers in a CNN are not different from the hidden layers in a common MLP, i. e. their function is to extract features from their input. These features are then given to the next hidden layer to extract still more complex features, or are directly given to a standard classifier to output the final prediction (usually a Softmax, but also SVM or any other can be used). In the context of image recognition, these features are images treats, like stroke patterns in the lower layers and object parts in the upper layers.
In natural images these features tend to be the same at all locations. Recognizing a certain stroke pattern in the middle of the images will be as useful as recognizing it close to the borders. So why don't we replicate the hidden layers and connect multiple copies of it in all regions of the input image, so the same features can be detected anywhere? It's exactly what a CNN does, but in a efficient way. After the replication (the "convolution" step) we add a sub-sample step, which can be implemented in many ways, but is nothing more than a sub-sample. In theory this step could be even removed, but in practice it's essential in order to allow the problem remain tractable.
Thus:
- Correct.
- As explained above, hidden layers of a CNN are feature extractors as in a regular MLP. The alternated convolution and sub-sampling steps are done during the training and classification, so they are not something done "before" the actual processing. I wouldn't call them "pre-processing", the same way the hidden layers of a MLP is not called so.
- Correct.
A good image which helps to understand the convolution is CNN page in the ULFDL tutorial. Think of a hidden layer with a single neuron which is trained to extract features from $3 \times 3$ patches. If we convolve this single learned feature over a $5 \times 5$ image, this process can be represented by the following gif:
In this example we were using a single neuron in our feature extraction layer, and we generated $9$ convolved features. If we had a larger number of units in the hidden layer, it would be clear why the sub-sampling step after this is required.
The subsequent convolution and sub-sampling steps are based in the same principle, but computed over features extracted in the previous layer, instead of the raw pixels of the original image.
Wagner et al. describe what they do in section 5c. They perform PCA on 7x7 pixel image patches (not whole images), treating patches as points and pixels as dimensions. This gives 49 principal components, each with a 49 element weight vector (which is an eigenvector of the covariance matrix). Reshaping each PCA weight vector to a 7x7 matrix gives a basis image patch (i.e. each original image patch can be expressed as a linear combination of the basis patches). Each basis patch is used as an initial filter kernel in the convolutional network, yielding 49 filters.
Best Answer
Different problems that you can can tackle using a graph representation. A graph is defined as a set of <V,E> (nodes , links).
Think of 2 examples.
atoms and their bonds (making together a molecule). in this case V= atoms, E= chmical bonds, graph is molecule
users in social network being connected if they are friends (making together a social network. in this case V= users, E= chmical bonds, graph is molecule
For Node classification given many valid molecules (training data) and given a new incomplete data (molecule with unknown atoms comprising it ) ask the model to tell you with atom of the periodic table of chemical elements it is. This can be seen as a property of the node.
For link classification in the example 2 given example of users that are friends and not friends, ask the network if 2 users are friends or not.
For graph classification imagine now that you want to specify some molecules are safe to consume and not safe to consume by humans.