[Math] Show that minimal CFG is undecidable (Sipser 5.36)

computational complexitycontext-free-grammarturing-machines

Question: Say that a CFG (context-free grammar) is minimal if none of its rules can be removed without changing the language generated. Let $MIN_{\text{CFG}}$ = $\{\, \langle G \rangle$ | $G$ is a minimal CFG$\}$.

a) Show that $MIN_{\text{CFG}}$ is Turing-recognizable.

b) Show that $MIN_{\text{CFG}}$ is undecidable.

Now part a) is relatively straightforward. If all rules are indispensable for some CFG $G$, we just need to find for each rule $R$ of $G$ a string $w_R$ that can be generated by $G$, but not with $R$ omitted. Thus, enumerating all the strings one by one for potential "certificates" for indispensability yields a recognizer.

However, I got stuck on part b). It would be easier (to prove undecidability) if the question concerns a specific rule of $G$, because then it can be reduced from the known undecidable language $ALL_{\text{CFG}} = \{\, \langle G \rangle$ | $G$ generates all strings $\}$. But I'm not sure whether it is possible to reduce between those two variants.

Thank you for any help!

Best Answer

It suffices to focus on the case $\Sigma = \{0,1\}$. We argue that $ALL_{\textsf{minCFG}} \leq_T MIN_{\textsf{CFG}}$. That is, we can determine whether a minimal CFG can generate all possible strings, by using the oracle for $MIN_{\textsf{CFG}}$.

Let $G$ denote the minimal CFG we are interested. Since $G$ is minimal, there exists a length $p$ such that, for each rule $R$ in $G$ there exists a string $w_R$ with $|w_R|<p$ that can be generated by $G$, but not with $R$ deleted. Note here that $p$ is computable in a brute force manner. Construct $G^+_{a_1a_2\cdots a_p}$ where $a_i \in \Sigma$, by adding following rules in $G$: $$ S \to a_1 a_2 \cdots a_pT \\ T \to T0 \ | \ T1 \ | \ \epsilon $$ First, consider what will happen if $G^+_{a_1a_2\cdots a_p}$ is not minimal, thus, there exists a rule $R$ in $G^+$ being dispensable. By our construction it is clear that each rule $R$ in $G$ is indispensable.

  • If $S \to a_1 a_2 \cdots a_pT$ or $T \to \epsilon$ is dispensable, then we have $a_1 a_2 \cdots a_p \Sigma^* \subseteq L(G)$.
  • If $T \to T0$ is dispensable, then we have $(a_1 a_2 \cdots a_p \Sigma^* - a_1 a_2 \cdots a_p 1^*) \subseteq L(G)$. Because whether $a_1 a_2 \cdots a_p1^* \subseteq L(G)$ is easy to determine by some classical algorithm, we can also determine whether $a_1 a_2 \cdots a_p \Sigma^* \subseteq L(G)$.
  • It is similar for $T \to T1$.

Therefore, if $G^+_{a_1a_2\cdots a_p}$ is not minimal, we can determine whether $a_1 a_2 \cdots a_p \Sigma^* \subseteq L(G)$. What if it is minimal? In this case we will immediately know $L(G) \neq \Sigma^*$. So we can suppose that all possible $G^+_{a_1a_2\cdots a_p}$ is not minimal. Then by checking whether $a_1 a_2 \cdots a_p \Sigma^* \subseteq L(G)$ one by one and checking those strings of less-than-$p$ length, we can still decide whether $L(G) = \Sigma^*$.

Update. (Oct 12, 2020) After a short discussion with @xdavidliu I realized that the argument above can directly prove what we want, without a justification of uncomputability of $ALL_{\textsf{minCFG}}$. Recall that the reduction we have established achieves the following: Given a minimal CFG $G$, determine whether $L(G) = \Sigma^*$. Now if we want to decide whether $L(G_0) = \Sigma^*$ for an arbitrary CFG $G_0$, just simply feed into the (reduction) algorithm all minimal sub-CFG (test each sub-CFG by the $MIN_{\mathsf{CFG}}$ oracle to determine whether it is minimal) of $G_0$ and see if there is one that generates all strings.


Since this problem appears in Chapter 5, in which Turing reduction has not introduced. I personally believe that there is an elegant proof using only mapping reduction.


Update. @Sylvain gives an elegant proof on https://cstheory.stackexchange.com/a/39454/46760, by showing $$ PCP \leq_T FullPCP \leq_m MIN_{\textsf{CFG}}$$ in which I personally thought the key is that context-free language is closed under $h^{-1}$ he defined.