[Math] Find CFGs that generate the regular language for all strings with exactly one a or one b

automatacontext-free-grammar

I have tackled this question by applying the Theorem that states "All regular languages are context-free languages. For any FA, there is a CFG such that the language generated by the grammar is the same as the language accepted by the FA."

So I firstly jotted down the regular expression for this language:
$(b*ab*) + (a*ba*)$

I then generated an NFA to parse the regular language:

enter image description here

Finally, I managed to put together a CFG as follows:

S -> A b A | C a C
A -> a B | null
B -> a A | null
C -> b D | null
D -> b C | null

I have managed to parse a handful of sentences that I would expect to be in the language:

b b b a b
b b a b b
b a b b b
b a a a a
a b b b b
a b a a a

I would really just appreciate a bit of feedback as to whether I am on the right track or not.

1) I cannot seem to derive a DFA easily so I have chosen to illustrate an NFA in the answer to this question. Is it okay to do this?

2) Is my CFG in the correct format? In other words, should I be using non-terminals before terminals and NULL so often?

I am about to read up about CNF shortly.

Thanks for taking the time.

Best Answer

Your question title says "exactly one a and exactly one b". It's worth noting that the language you construct is the strings with "exactly one a" or "exactly one b". Assuming that's what you intended, then...

1) There's a systematic way to derive a DFA from an NFA. The states in the DFA represent possible subsets of the NFA. The state transitions indicate how each of the different states transition as a result. In your case, you start out in what could be either A or C. After seeing an 'a', you can be in either A or D; if instead you saw a 'b' you would be in either C or B. And so on. For each possible set of states in the NFA you encounter, create a single state in the DFA.

2) Whether or not the CFG is in the "correct format" totally depends on what you're using it for. For the purposes of communicating the CFG to other humans, that seems like a totally reasonable format. Some computer software, on the other hand, might have constraints that terminals can only come after the non-terminals, or whatever.

Your CFG seems correct, but it could be described significantly more simply as

S -> A b A | B a B A -> a A | Null B -> b B | Null

Related Question