My advice, in short, would be to try a Kalman filter.
The longer version is this. To restate your problem, at every time step $t$ you have some noisy sensory estimates of robot position $(\hat{x_t},\hat{y_t})$, and you want to infer the robot's true position $(x_t,y_t)$.
Given only the data from a single time step, I don't think there's much you can do with this data. Unless there is some consistent bias in the sensory estimates, your best guess of the robot's position given the current sensory data is simply $(\hat{x_t},\hat{y_t})$. However, your robot's position presumably is highly correlated from one time step to the next, so you could use this information to your advantage. To put it probabilistically, you can make use of the relationship $p(x_t,y_t|x_{t-1},y_{t-1})$ in the following manner:
$$\int p(x_t,y_t|\hat{x}_t,\hat{y}_t,\hat{x}_{t-1},\hat{y}_{t-1})\propto $$
$$
p(\hat{x_t},\hat{y_t}|x_t,y_t)p(x_t,y_t|x_{t-1},y_{t-1})p(x_{t-1}, y_{t-1}|\hat{x}_{t-1:t-N},\hat{y}_{t-1:t-N})dx_{t-1}dy_{t-1} $$
Let's break this down. In words, the idea is that your sensory information from all previous time points provides information about the robot's position at time $t-1$. This information is quantified by the distribution $p(x_{t-1}, y_{t-1}|\hat{x}_{t-1:t-N},\hat{y}_{t-1:t-N})$. Now the robot's position is correlated over time, and this link is described by $p(x_t,y_t|x_{t-1},y_{t-1})$ (i.e. given that the robot was at this location before, where is it likely to have moved to?). Thus, the rightmost part of the equation means that you look at every possible location that the robot could have been in before given your history of sensory data, and this gives you a prediction of all the positions (and their probabilities) that the robot could be now, based only on this historical data.
In other words, the history of sensory data constrains the range of positions where the robot could be right now. Finally, you update this belief by the information gained by observing the sensory data at the current time.
Note that this expression can be computed recursively, as the distrbution of information gained from the sensory history up to $t-1$ can be decomposed into terms depending on $t-1$, and then a term for the history up to $t-2$, so that you get a formula that is equivalent to the one above. Thus, in practice what you would do is start with the first two time points, compute the left hand side of the equation, and then continue with the next time point. The inference at each time point thus depends only on the sensory data at that time, and the running estimate of the information based on sensory history up to $t-1$. (In other words, the problem can effectively be cast as a Markov chain.)
Where does machine learning come into this? Well, you need to know two things: (1) a transition function that gives you $p(x_t,y_t|x_{t-1},y_{t-1})$, i.e. the way a robot can change its location from one time point to the next, and (2) a generative model $p(\hat{x_t},\hat{y_t}|x_t,y_t)$, i.e. a function that describes the probability of observing a certain sensory position reading given the robot's true position. Both functions may be known to you by construction (e.g. you may know the transition function if you know how the robot is programmed to behave, and you may know the generative model of your sensory readings from the manufacturer's specifications). If this information is not known a priori, however, you'd have to learn it from a set of training data. This is not necessarily something that has a plug-and-play solution, however; you'd have to look at the data and consider what you know about the problem, and then figure out how best to model it.
"P.S.": I wrote all this and then found this question which might be more concise and to the point for your needs. But what the heck, I'll just leave this here as an explanation of the assumptions behind a Kalman filter.
In my understanding (though I wouldn't be surprised to be challenged on this), machine learning and statistics tackle partially similar problems, but machine learning focuses on the specific problem of prediction, and machine learning methods are most often not based on a model of the data-generating process. Secondly, statistics focuses on data-generating processes that involve randomness, while "Chaos Theory" a.k.a. nonlinear dynamics focuses on deterministic processes. Therefore, machine learning is two steps removed from nonlinear dynamics or chaos.
There is a weak relationship between machine learning and the phenomenon of chaos since both are about prediction, or rather predictability in the second case. However, chaos is about limits of predictability due to insufficient knowledge of initial conditions even though there is a perfect model of the process, while machine learning is about the practical problem of actually predicting without caring or knowing much about the underlying process.
There is also a link between chaos and statistics insofar as it can be shown that specific chaotic systems can be mapped onto random processes. The basic idea is that chaotic dynamics amplifies differences in states, which means over time more and more details of the initial conditions come to matter. If the not-infinitely-precise knowledge of initial conditions is conceptualized as random, that means the large-scale output of a chaotic system can be considered random. For more details see e.g. here. However, nonlinear dynamics tends to focus on low-dimensional dynamical systems and their even lower-dimensional attractors, while randomness in many real world situations handled by statistics and machine learning has not to do with not knowing the 100th digit of a few initial conditions, but with not knowing anything of the state of very high-dimensional influences.
I hope this helps clarify matters.
Best Answer
The answer is that All of the methods can be used for the above problem.
Well, two things should be noted in these kinds of simple problems.
Is it a classification or a regression problem? You might have already guessed that it is a classification problem.
Are there any categorical values in the input features? If yes, does the chosen algorithm work with categorical variables.
The examiner may expect the answer that neural networks, SVM etc. don't work with categorical variables. But in fact you can encode a categorical variable as a series of binary variables. For example if the variable age group takes values {child, young, old}, then you may change this single variable to three binary variables; is_child, is_young and is_old. This way you can use svm or neural network.
Again linear regression looks like an unlikely candidate for a classification problem. But they can be used for classification as well. You don't expect any mentionable performance though.