Short answers:
1) As you said the difference between the two is only in the spatial structure.
2) A lot of people work to find an equivalent mathematical formulation between the two, especially in the Bayesian framework. See for example the work of Rue of the paper of Lindgren http://www.math.ntnu.no/inla/r-inla.org/papers/spde-jrssb-revised.pdf
3) i do not see how. If you use the spatial autoregressive model, for each point in an area you predict the same value, while with the kriging, the mean on the entire are is the same but not the value of the process, you predict a different value for each point
Long answers:
Obviously, since the two processes are defined by the type of spatial interaction/structure, they are profound different. In the spatial autoregressive model, two spatial points $Y_1$ and $Y_2$ are dependent if they are close in some sense. As an example we can think about two states, what happen to the state $Y_2$ can depend on what happen in $Y_1$ only if they share a border. Generally to specify a spatial autoregressive model you have to specify also a matrix of proximity (or neighborhood). From a computational point of view the spatial autoregressive model is convenient since the covariance matrix between the observations is a sparse matrix and then if we need its inversion this can be done efficiently. On the other hand with this kind of model we are not able to predict the value of the process on some non-observed locations because we can not estimate the correlation between. This kind of models (the auto-model) should be used only for process that are spatially discrete, i.e. its realization can be observed only on specific locations, but they are used also for continuous process.
In the kriging model we suppose that the dependencies is continuous, between $Y_1$ and $Y_2$ the correlation generally depends on the distance between the observations, if they are close the correlation is higher. With this model we are able to make prediction on a new location since we know the value of the correlation between the process on the new location and the observed process (it depends on the distance). On the other hand in this case the covariance matrix is not sparse, unless you use a correlation function that goes to zero after a certain distance, its inversion is computationally intensive.
Since the computational advantage of the spatial autoregressive model (or the auto model in general), some people start to think about the possibility of approximate a continuous process with a discrete one, the paper of Lindgren I linked above is one of the best result in this field.
If you need some book where the autoregressive model and the Kriging are well exlained, i suggest Hierarchical Modeling and Analysis for Spatial Data (http://www.amazon.com/Hierarchical-Modeling-Monographs-Statistics-Probability/dp/158488410X)
Best Answer
These terms probably do not have a universally accepted technical definition, but their meanings are reasonably clear: they refer to second order and first order variation of a spatial process, respectively. Let's take them by order after first introducing some standard concepts.
A spatial process or spatial stochastic process can be thought of as a collection of random variables indexed by points in a space. (The variables have to satisfy some natural technical consistency conditions in order to qualify as a process: see the Kolmogorov Extension Theorem.)
Note that a spatial process is a model. It is valid to use multiple different (conflicting) models to analyze and describe the same data. For instance, models of naturally occurring concentrations of metals in soils may be purely stochastic for small regions (such as a hectare or less) whereas over large regions (extending many kilometers) it's usually important to describe underlying regional trends deterministically--that is, as a form of spatial heterogeneity.
Spatial heterogeneity is a property of a spatial process whose mean (or "intensity") varies from point to point.
The mean is a first order property of a random variable (that is, related to its first moment), whence spatial heterogeneity can be considered a first order property of a process.
Spatial dependence is a property of a spatial stochastic process in which the outcomes at different locations may be dependent.
Often we can measure dependence in terms of the covariance (second moment) or correlation of the random variables: in this sense, dependence can be thought of as a second-order property. (Sticklers will be quick to point out that correlation and independence are not the same, so equating dependence with second order properties, although intuitively helpful, is not generally valid.)
When you see patterns in spatial data, you can usually describe them either as heterogeneity or dependence (or both), depending on the purpose of the analysis, prior information, and the amount of data.
Some simple, well-studied examples illustrate these ideas.
In this figure, the square demarcates an area of higher spatial intensity. All point locations, however, are independent: the clustering and gaps in points are typical of independent randomly chosen locations.
The spatial dependence in this Gaussian process is apparent through the patterns of ridges and valleys. They are homogeneous, though: there is no trend overall. Note, however, that if we were to focus on a small part of this area, we might elect to treat it as an inhomogeneous process (that is, with a trend) instead. This illustrates how scale can influence the model we choose.
This image shows a different realization of the random component of this process than used for the previous illustration, so the patterns of small undulations will not be exactly the same as before--but they will have the same statistical properties.