Machine Learning – Are Models Using Satellite Image Inputs Well-Posed?

machine learning

I am using machine learning to create a land use regression model. My inputs are geographic coordinates. These I use to extract 80×80 meter satellite images or maps to feed the model.

Lets take the example of a neural network:

Existence: The solution will always exist for any finite input. It may overflow if the input and/or weights are unreasonable large, but the solution exists in principle.

Unique: The solution will be unique by definition. Of course, the training data may have non-unique outputs for their inputs due to measurement noise, but the resulting model will only produce one output given an input.

Continuity: Hadamards last criterion I am uncertain how to adjust to land use regression. While neural networks are function compositions of continuous functions and thus continuous themselves, the input space is not; The border between various object, like houses and streets, form discontinuities.

Sure, if I deform what's inside the image continuously, there is no issue, but how can I allow, lets say, a new street continuously enter the image? If I start scanning over a street, and I do it as a function of time, there will be a time $t_n$ in which the image $I(t_n)$ does not contain the street but after which $I(t_n +\delta)$ will include the street for all $\delta>0$ (as long as $\delta$ is not so large that the street has left on the opposite side of the image). This is a discontinuity that would lead to discontinuous output as well, and thus contradict Hadamards final property.

What is the correct way to think about this if I want it to be well-posed? I mean, I guess I can consider my inputs as a local statistic of sorts while my actual input is the entire globe… But, how would one go about making land-use regression well posed? Or for that sake, any problem where the input is a rectangular subset of Earth as seen from space.

Best Answer

You could imagine a "sliding view function" $f(I, x) = V$ of some large input image $I$ and position $x$, and returns a cropped view $V$ of the image. This function could be implemented with linear interpolation*, which allows $x$ to be any continuous value -- in particular, this allows you to shift the view by arbitrarily small $\epsilon$. Since $f$ is continuous/differentiable wrt $x$, $\text{NN}(f(I,x))$ is also continuous wrt x.

*In one dimension, linear interpolation says that if pixel $i$ in the output image is centered on a non-integer position $x$ in the input image, then letting $u = \lfloor x \rfloor, v=\lceil x \rceil$, the value of pixel $i$ should be $I(u)(v-x) + I(v)(x-u)$. So each pixel is continuous wrt $x$.

You might think it doesn't make any sense that the entirety of earth fits in some image $I$ / there's no reason to map the earth onto some discretized equirectangular grid. In that case, imagine $g(x)$ to be "the image captured if I put a camera at location $x$". Let's examine some arbitrary fixed pixel in this image: if we had an ideal pinhole camera, and objects in real life had sharp boundaries, then yes, the value of this pixel could be discontinuous wrt $x$. In practice, there is a point spread function that says even an perfectly sharp point source of light gets blurred out by an imaging system. So I claim $g(x)$ will be continuous.