Just to summarize, if you substitute your 0 values with some very small value such as 1e-17, your impossible states will still be the furthers apart. As you are trying to derive some sort of ordering of your data, it doesn't matter whether your smallest value is -Inf or -39, as either number will be the smallest value in the dataset.
Without any actual events or a physical model that indicates the transition, I think the best you can do is a reasonable upper bound on the probability. The probability could, in fact, be zero.
Assuming N opportunities to make the transition, 0 of which made that particular one, then we could assume that the probability of what we saw is has no less than a 50% probability. In this case:
$P[0$ of $N] \ge 0.5, (1-p)^N \ge 0.5, 1-p \ge 0.5^{\frac{1}{N}}$,
$p \le 1 - 0.5^{\frac{1}{N}}$
Note that this upper bound gets small very quickly with large $N$.
For an even greater confidence, one could say that the probability of the observed event is no less than 5% probable, thus:
$p \le 1 - 0.05^{\frac{1}{N}}$
Here, alot more data is needed to get a low upper bound, but you can have pretty high confidence in it.
I'm guessing what you'd really like is a lower bound other than 0, but I'm afraid we can't get there from here.
Best Answer
The order of a Markov chain is how far back in the history the transition probability distribution is allowed to depend on.
For a first-order Markov chain, the probability distribution of the next state can only depend on the current state. So your transition matrix will be 4x4, like so:
$$\left(\begin{array}{cccc} P(A \to A) & P(A \to C) & P(A \to T) & P(A \to G) \\ P(C \to A) & P(C \to C) & P(C \to T) & P(C \to G) \\ P(T \to A) & P(T \to C) & P(T \to T) & P(T \to G) \\ P(G \to A) & P(G \to C) & P(G \to T) & P(G \to G) \\ \end{array}\right)$$
(where $P(A \to C)$ is the probability of seeing a $C$ if you just read an $A$, etc.
Now, if you go up to order 2, the probability distribution of the next state can depend on the previous two states. So instead of just looking at $P(A \to C)$, we now need to look at $P(AA \to C), P(CA \to C), P(TA \to C), P(GA \to C)$. This will result in a $16 \times 16$ transition matrix that I don't really want to write out, but hopefully you get the idea.
As for drawing the state diagram, you can find some examples online--the basic idea is that you draw a circle for each state, an arrow between each two states with transitions, and label each arrow with the transition probability. Here's an example with four states (from here):