I want some one to explain this table.
We are studying six factors ($A$, $B$, $C$, $D$, $E$, and $F$) in a full factorial design. Thus we have $2^6 = 64$ runs. We want to carry out the experiment in eight blocks of eight runs each and we want to confound $ABCD$, $ACE$, and $ABEF$ with blocks.
The key to understanding the table is to consider the eight blocks as being determined by three more factors, each with two levels, as discussed in Chapter 10 of Montgomery (1991). The number eight is rather convenient since $2^3=8$. Sometimes, these extra factors are called "pseudo-factors".
Think about the original design with columns labeled from $A$ to $F$ and with 64 rows. Now, mentally add three more columns labeled $G$, $H$, and $J$ ($J$ chosen to not conflict with the identity $I$). Unique combinations of $G$, $H$, and $J$ are going to represent blocks.
Actually, you should now be able to see that we are essentially designing a $2^{9-3}$ fractional factorial experiment. Per the exercise, we confound the additional factors as follows: $G=ABCD$, $H=ACE$, and $J=ABEF$. This gives us three defining relations: $I= ABCDG$, $I=ACEH$, and $I=ABEFJ$.
To obtain the alias pattern, expand the expression
$$(I+ABCDG)(I+ACEH)(I+ABEFJ)$$
to obtain
$$\begin{align} I & =ABCDG = ACEH = BDEGH \\ &= ABEFJ = CDEFGJ = BCFHJ = ADFGHJ.\end{align}$$
Then grind through each of $A$, $B$, $AB$, $C$, $AC$, $BC$, and $ABC$ to find the complete alias pattern for the design.
Now we need the defining contrasts, which are obtained from the defining relations:
$$\begin{align}
L_1 &= x_1 + x_2 + x_3 + x_4 + x_7 \\
L_2 &= x_1 + x_3 + x_5 + x_8 \\
L_3 &= x_1 + x_2 + x_5 + x_6 + x_9
\end{align}$$
The block containing $(1)$ is the principal block. You can see that in the table under "Block 6". Now, the pattern of defining contrasts for $(1)$ is given by $(L_1, L_2, L_3)=(0,0,0)$. To get the rest of the runs that belong in that block, we need to find all runs that yield the same value for $(L_1, L_2, L_3)$ as $(1)$, namely $(0,0,0)$. What we could do is just compute $(L_1, L_2, L_3)$ for all 64 runs, and then sort them.
Anyway, notice, for example, that $abcd$ evaluates to $(0,0,0)$, as do the rest of the entries in the table under "Block 6". Suppose we were given the principal block. Notice for example that if you take the first run in "Block 3", $a$, and multiply it by each row in the principal block using modulo 2 arithmetic, that you get the rest of the entries. (E.g., $a \times abcd = bcd$.)
So you could get the rest of the blocks from the principal block by filling in the first row. Notice in the table that the first row has been filled out as a $2^3$ factorial with factors $A$, $B$, and $C$.
Also notice from the defining relation for blocks, or by inspection of the table, that we do have main effects aliased with second order interactions. That makes this a resolution III design, not particularly great. That is, the design in blocks can be implemented as a $2^{9-3}_{III}$ fractional factorial design.
If we change the number of blocks, how do we construct the design?
Well, we would want our number of blocks to fit nicely in a factorial structure like $2^1$, $2^2$, $3^1$, or whatever. Then there would be some tedious work along the lines above. Or, more likely, generate a design using software.
Montgomery, D. (1991) Design and Analysis of Experiments. Third Edition. John Wiley & Sons.
Reading your question, I understand that you think that "global optimization methods" always give the global optimum whatever the function you are working on. This idea is very wrong. First of all, some of these algorithm indeed give a global optimum, but in practice for most functions they don't... And don't forget about the no free lunch theorem.
Some classes of functions are easy to optimize (convex functions for example), but most of the time, and especially for neural network training criterion, you have : non linearity - non convexity (even though for simple network this is not the case)... These are pretty nasty functions to optimize (mostly because of the pathological curvatures they have). So Why gradient ? Because you have good properties for the first order methods, especially they are scalable. Why not higher order ? Newton method can't be applied because you have too much parameters and because of this you can't hope to effectively inverse the Hessian matrix.
So there are a lot of variants around which are based on second order method, but only rely on first order computations : hessian free optimization, Nesterov gradient, momentum method etc...
So yes, first order methods and approximate second order are the best we can do for now, because everything else doesn't work very well.
I suggest for further detail : "Learning deep architectures for AI" from Y. Bengio. The book : "Neural networks : tricks of the trade" and Ilya Sutskever phd thesis.
Best Answer
You refer chapter 11 of Montgomery's textbook, with the title "Response Surface Methods and Other Approaches to Process Optimization". So, interpreting your question as why response surface methods do not use global optimization methods.
Global optimization methods, as studied in optimization theory, is doing optimization on some known mathematical function. Implicitly, one is assuming that evaluation of function values at arbitrary points is cheap.
In contrast, Response Surface Methods tries to optimize some real-world system, often an industrial process. There is no known mathematical function representing the system, and "evaluation of values" is not cheap, it might imply running the process under non-standard conditions for some time (days? hours?), costs of changing process parameters, maybe even costs of destroyed or reduced production.
So this is very different from optimization of a known mathematical function, in reality, one is simultaneously trying to build a model of the system, and optimize it, under the restriction that evaluation is costly! That points to some analogy with active learning, see Motivations for experiment design in statistical learning?.
Another keyword is evolutionary operation see Best DoE method to fit Gaussian Process Regressor