An urn contains two red balls and one green ball.
One ball was drawn yesterday, one ball was drawn today, and the final ball
will be drawn tomorrow. All of the draws are "without replacement".
Suppose you know that today's ball was red, but you have no information
about yesterday's ball. The chance that tomorrow's ball will be red
is 1/2. That's because the only two remaining outcomes for this random experiment are "r,r,g" and "g,r,r".
On the other hand, if you know that both today and yesterday's balls were red, then you are guaranteed to get a green ball tomorrow.
This discrepancy shows that the probability distribution for tomorrow's color depends not only on the present value, but is also affected by information about the past. This stochastic process of observed colors doesn't have the Markov property.
Update: For any random experiment, there can be several related processes some of which
have the Markov property and others that don't.
For instance, if you change sampling "without replacement" to sampling "with replacement" in the urn experiment above, the process of observed colors will have the Markov property.
Another example: if $(X_n)$ is any stochastic process you get a related Markov
process by considering the historical process defined by
$$H_n=(X_0,X_1,\dots ,X_n).$$ In this setup, the Markov property is trivially fulfilled
since the current state includes all the past history.
In the other direction, you can lose the Markov property by combining states, or
"lumping". An example that I used in this MO answer, is to take a random walk $(S_n)$ on
the integers, and define $Y_n=1[S_n>0]$. If there is a long string of time points with $Y_n=1$, then it is quite likely that the random walk is nowhere near zero and that the
next value will also be 1. If you only know that the current value is 1, you are not
as confident that the next value will be 1. Intuitively, this is why $Y_n$ doesn't have
the Markov property.
For cases of lumping that preserve the Markov property, see this MSE answer.
It might be important to differentiate between the various stochastic process types based on both state space and time variable. (Note: discrete space/time can also be called countable.)
So there are 4 types:
- Discrete-spacetime: the process moves from state-to-state (each of which can be represented by integers) in discrete steps. For example, imagine a random walk on a graph that takes a step for each $t\in \mathbb{Z}_{\geq 0}$.
- Discrete-time continuous-space: the process moves in discrete turns, but takes continuous values. For instance, the classic (discrete-time) random walk of unit step-size on $\mathbb{R}^n$. (See also here).
- Continuous-time discrete-space: the process moves continuously in time, but in a countable space (e.g. see Continuous-time discrete-space models for animal movement).
- Continuous spacetime: the time variable is continuous, and the process moves in a continuous space (e.g. $\mathbb{R}^n$). This includes Brownian motion and other Ito processes.
The next part is not so clearly agreed upon in the literature. I will simply state the definitions I am used to seeing.
A Markov process is any stochastic process that satisfies the Markov property. It doesn't matter which of the 4 process types it is.
A Markov chain is a Markov process with a discrete state space (i.e. can be type 1 or 3).
A Discrete-time Markov chain (or discrete Markov chain) is a Markov process in discrete time with a discrete state space (i.e. type 1, above).
A Continuous-time Markov chain (or continuous Markov chain)
is a Markov process with a discrete state space in continuous time (i.e. of type 3). (E.g. see here).
A Stationary process is a stochastic process with a joint probability distribution that does not change when translated in time (see here).
A Time-homogeneous Markov chain is a stationary Markov chain. This means that the transition probabilities do not change in time. So, the probability of going from one state $s_1$ to another state $s_2$, once you are at $s_1$, is always the same (i.e. it doesn't matter when you get there).
A discrete-time stationary Markov chain is the most classic case (and in fact what most people mean when they say Markov chain).
Best Answer
The difference between Markov chains and Markov processes is in the index set, chains have a discrete time, processes have (usually) continuous.
Random variables are much like Guinea pigs, neither a pig, nor from Guinea. Random variables are functions (which are deterministic by definition). They are defined on probability space which most often denotes all possible outcomes of your experiment/model. In schools their value set is almost always a subset of $\mathbb{R}$.
Sequences of random variables don't need to be memoryless, e.g. sequences of random variables that denote some cumulative usually aren't memoryless. On the other hand, for example, sequences of independent identically distributed random variables do not depend on time at all, and so they have to be memoryless. Those two examples are something like extremes, where the next variable in the sequence depends on all of the previous (in the former example), or none of them (in the latter). The Markov property tells us, they may depend, but if they do, it does not give us any more information (e.g. in the case of discrete time, that is, Markov chains, it means that the next can be determined only using the current and nothing else). Finally, note that there is a difference between "does not depend" and "does not give us any new information", for example consider a Markov chain defined on the set of finite binary sequences where each step adds a (uniformly) random bit. Clearly, the next state does depend on all the previous "coin flips" (these are embedded in the bits of sequence prefix), but the current state already contains everything we need.
I hope it explained something $\ddot\smile$