While going through some C++ code about stochastic processes, I came across this concept of discretization repeatedly. I have checked the Wikipedia link but description goes into deeper details too quickly for a non-mathematician like me. Can someone explain in simple terms what the concept means, what is its use and how it is applied?
[Math] How to describe discretization to a novice
discrete mathematicsintuitionstochastic-processesterminology
Related Solutions
It's a good question. The naive answer is that the two concepts are different.
Random walks on ${\mathbb Z}^d$ or ${\mathbb R}$, etc. are examples of a random walk on an additive group $G$. There is a fixed distribution $\mu$ on $G$, and at each time you randomly select an element of the group (using distribution $\mu$) and add it to the current state.
In contrast, for a random walk on a (locally finite) graph, you randomly choose an edge uniformly and follow that edge to the next state (node).
However, the concepts may be related at a deeper level. The first chapter of Woess's Random Walks on Infinite Graphs and Groups explains how a random walk on a graph is related to a random walk on a quotient of its automorphism group, and also how a random walk on a group is related to a walk on its Cayley graph.
I haven't read the book in detail, but my impression is that, nevertheless, the two concepts are not exactly equivalent.
Update (Nov. 7 2010)
Any stochastic process $(X_n)$ taking values in a countable state space can be built up out of the following pieces:
(1) an initial distribution $P(X_0=x)$ that tells us how to choose the starting point.
(2) transition kernels that tell us how to choose the next state given the history of the process up to the present, that is, $P(X_{n+1}=x_{n+1} | X_n=x_n, \dots, X_0=x_0)$.
Now, a process is called Markov if the kernels only depend on $n$, $x_{n+1}$ and $x_n$, that is, $P(X_{n+1}=x_{n+1} | X_n=x_n, \dots, X_0=x_0) = P(X_{n+1}=x_{n+1} | X_n=x_n)$. This is a big assumption and a huge simplification. But enough models of real world random behaviour have this "memoryless" property to make Markov processes wide spread and extremely useful.
A further simplification is to assume that the Markov process is time homogeneous, so that the transition rules only depend on the states and not on time. That is, we assume that $P(X_{n+1}=y | X_n=x)=P(X_1=y | X_0=x)$ for all $x,y$. The crucial point here is that $p(x,y):=P(X_{n+1}=y | X_n=x)$ is the same for all $n$.
Let's now assume that the state space is an additive group. The transition kernels for a general process can be rewritten in terms of increments: $$P(X_{n+1}=x_{n+1} | X_n=x_n, \dots, X_0=x_0)$$ can be written $$P(X_{n+1}-X_n=x_{n+1}-x_n | X_n-X_{n-1}=x_n-x_{n-1}, \dots, X_0=x_0).$$ If the increments are independent, then this becomes $$P(X_{n+1}-X_n=x_{n+1} -x_n)=:\mu_n(x_{n+1}-x_n).$$ In this case, the process is automatically Markov and the transition kernels are an arbitrary sequence $(\mu_n)$ of probabilities on the state space.
Finally, for a random walk we assume both time homogeneity and independent increments. Thus, the transition kernel $p(x,y)$ is represented by a single measure: $p(x,y)=\mu(y-x)$. This imposes a kind of homogeneity in space and in time. We are down to an extremely special, yet still interesting process, that only depends on the initial distribution and the single measure $\mu$. This is why Ross insists that for random walks the increments are identically distributed; just being independent is not enough.
Because random walks are so special, they can be analyzed by mathematical tools that don't work for most Markov processes. In particular, tools from harmonic analysis are very fruitful here. A standard reference, though a little old, is Principles of Random Walk (2nd ed) by Frank Spitzer. A more recent treatment is given in Random Walk: A Modern Introduction by Gregory Lawler and Vlada Limic (2010).
Here's an exercise to convince you that random walks are only a small part of all Markov chains. Take the state space to be the corners of a triangle, labelled $\{ 0, 1, 2\}$, considered as the group ${\mathbb Z}$ mod 3. Any 3 by 3 stochastic matrix (non-negative entries and row sums all equal to 1) defines a time homogeneous Markov chain on this state space. Which of these are random walks?
I can't explain this approach to a five-year-old child. But here are my considerations, which work for any graph, but perhaps not very effectively from the computer science point of view.
Any graph of order $n$ has an empty subgraph and $n$ single-vertex subgraphs.
More generally, there are $n\choose k$ possibilities to choose $k$ vertices from $n$.
If a graph of order $k$ has $m$ edges, then it has $2^m$ different subgraphs of order $k$.
Applying considerations 3 and 4 for each $k=2,\ldots,n$ we can compute the total number of subgraphs of a given graph.
Addition. For example, for the graph in this post How many subgraphs of this simple graph? it looks like this:
empty subgraphs $(1)$
subgraphs with a single vertex $(4)$
subgraphs with two vertices - $6$ of them
2a. three have no edges $(3)$
2b. three have exactly one edge $(3\cdot2)$
subgraphs with three vertices - $4$ of them
3a. one has no edges $(1)$
3b. three have exactly two edges $(3\cdot2^2)$
subgraphs with four vertices $1$. It has three edges ($2^3$)
Now let's sum up all the numbers in brackets $$ 1+4+3+3\cdot2+1+3\cdot2^2+2^3=35. $$
Best Answer
I am not sure if this is what you are looking for, but I'll try......(It would have been better, if you'd mentioned what you understood, and what you're looking for.)
First of all, you should know the difference between a continuous variable and a discrete variable.
A continuous variable is a variable, which can take an infinity of values between any two chosen points. A discrete variable, on the other hand is one, which can take only a fixed number of values, between any two chosen points.
The process of discretization involves converting a continuous variable in to a discrete variable. This has the advantage of reducing the computational complexity of the resulting model(fewer values mean fewer calculations). The exact process of how it is done, depends on the phenomenon we're modelling.In engineering applications, for example it involves discarding values that are insignificant (i.e which introduce only a small error, when we physically realize the model). There are two key things to keep in mind while creating a discrete model -
As you can imagine, we have to maintain an optimal tradeoff between the two. Increasing one will decrease the other. The bulk of the process of discretization, centers on this optimization process.
The meaning of the word accuracy needs to be explained here - you are using the discrete model as a substitute for the continuous(original) model. So we should be able to extract the main features of the original model from the discrete model. (To be able to do this while retaining computational efficiency is where the ability of the mathematician/engineer/scientist who's creating the model, lies.)
This is a general explanation of discretization, without any specific field of application in mind.(the engineering reference was because of my own background.)
I hope it helps.... if not, please let me know.