Solved – Could ResNet curve be used for a regression problem, e.g. housing price prediction

machine learningneural networks

A Residual Neural Network (ResNet) is an Artificial Neural Network (ANN) of a kind that builds on constructs known from pyramidal cells in the cerebral cortex. Residual Neural Networks do this by utilizing skip connections, or shortcuts to jump over some layers.

Could it be used for a regression problem, e.g. housing price prediction? How about bias-variance?

Actually I am trying to train a model that predict the length of cut for operation on patients. I guess I can use FCNN to do the job. How about ResNet?

Best Answer

Just to make it clear, a regression problem is one whose target is continuous and not discrete. In this sense you can make any Neural Network that is primarily used for classification a regressor, with minimal changes. Namely it needs to end with $1$ neuron, no activation function and a proper loss function (e.g. mean squared error). For example, object detection is in its core a regression problem because you are trying to predict coordinates. Any ResNet could be used for these problems.

I'm going to guess, however, that when you mean regression you mean on a structured dataset like "boston housing". This gets trickier, because here it comes down to how you define ResNet.

If by a ResNet architecture you mean a neural network with skip connections then yes, it can be used for any structured regression problem.
If you mean the specific type of CNN that is used for image classification then no. That network is build with 2D convolution layers which require their input to be 2D as well. Structured datasets won't work with this model.

Related Solutions

Solved – Gradient backpropagation through ResNet skip connections

Add sends the gradient back equally to both inputs. You can convince yourself of this by running the following in tensorflow:

import tensorflow as tf

graph = tf.Graph()
with graph.as_default():
    x1_tf = tf.Variable(1.5, name='x1')
    x2_tf = tf.Variable(3.5, name='x2')
    out_tf = x1_tf + x2_tf

    grads_tf = tf.gradients(ys=[out_tf], xs=[x1_tf, x2_tf])
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        fd = {
            out_tf: 10.0
        }
        print(sess.run(grads_tf, feed_dict=fd))

Output:

[1.0, 1.0]

So, the gradient will be:

passed back to previous layers, unchanged, via the skip-layer connection, and also
passed to the block with weights, and used to update those weights

Edit: there is a question: "what is the operation at the point where the highway connection and the neural net block join back together again, at the bottom of Figure 2?"

There answer is: they are summed. You can see this from Figure 2's formula:

$$ \mathbf{\text{output}} \leftarrow \mathcal{F}(\mathbf{x}) + \mathbf{x} $$

What this says is that:

the values in the bus ($\mathbf{x}$)
are added to the results of passing the bus values, $\mathbf{x}$, through the network, ie $\mathcal{F}(\mathbf{x})$
to give the output from the residual block, which I've labelled here as $\mathbf{\text{output}}$

Edit 2:

Rewriting in slightly different words:

in the forwards direction, the input data flows down the bus
- at points along the bus, residual blocks can learn to add/remove values to the bus vector
in the backwards direction, the gradients flow back down the bus
- along the way, the gradients update the residual blocks they move past
- the residual blocks will themselves modify the gradients slightly too

The residual blocks do modify the gradients flowing backwards, but there are no 'squashing' or 'activation' functions that the gradients flow through. 'squashing'/'activation' functions are what causes the exploding/vanishing gradient problem, so by removing those from the bus itself, we mitigate this problem considerably.

Edit 3: Personally I imagine a resnet in my head as the following diagram. Its topologically identical to figure 2, but it shows more clearly perhaps how the bus just flows straight through the network, whilst the residual blocks just tap the values from it, and add/remove some small vector against the bus:

Solved – Could ROC curve be used for regression problems

In survival regression, a common metric is the "concordance index" that measures how well a regressor ranks pairs of samples. This concordance index is equivalent to the area under the ROC curve, see Why is ROC AUC equivalent to the probability that two randomly-selected samples are correctly ranked?

Note that the concordance index does not care about the absolute values of your predictions. The only thing that matters are the relative values compared to other samples in the dataset. A good concordance does therefore not mean that the regressor is also well calibrated.

Best Answer

Related Solutions

Solved – Gradient backpropagation through ResNet skip connections

Solved – Could ROC curve be used for regression problems

Related Question