Reading your question, I understand that you think that "global optimization methods" always give the global optimum whatever the function you are working on. This idea is very wrong. First of all, some of these algorithm indeed give a global optimum, but in practice for most functions they don't... And don't forget about the no free lunch theorem.
Some classes of functions are easy to optimize (convex functions for example), but most of the time, and especially for neural network training criterion, you have : non linearity - non convexity (even though for simple network this is not the case)... These are pretty nasty functions to optimize (mostly because of the pathological curvatures they have). So Why gradient ? Because you have good properties for the first order methods, especially they are scalable. Why not higher order ? Newton method can't be applied because you have too much parameters and because of this you can't hope to effectively inverse the Hessian matrix.
So there are a lot of variants around which are based on second order method, but only rely on first order computations : hessian free optimization, Nesterov gradient, momentum method etc...
So yes, first order methods and approximate second order are the best we can do for now, because everything else doesn't work very well.
I suggest for further detail : "Learning deep architectures for AI" from Y. Bengio. The book : "Neural networks : tricks of the trade" and Ilya Sutskever phd thesis.
Best Answer
Genetic algorithms (GA) are a family of heuristics which are empirically good at providing a decent answer in many cases, although they are rarely the best option for a given domain.
You mention derivative-based algorithms, but even in the absence of derivatives there are plenty of derivative-free optimization algorithms that perform way better than GAs. See this and this answer for some ideas.
What many standard optimization algorithms have in common (even derivative-free methods) is the assumption that the underlying space is a smooth manifold (perhaps with a few discrete dimensions), and the function to optimize is somewhat well-behaved.
However, not all functions are defined on a smooth manifold. Sometimes you want to optimize over a graph or other discrete structures (combinatorial optimization) -- here there are dedicated algorithms, but GAs would also work.
The more you go towards functions defined over complex, discrete structures, the more GAs can be useful, especially if you can find a representation in which the genetic operators work at their best (which requires a lot of hand-tuning and domain knowledge).
Of course, the future might lead to forget GAs altogether and develop methods to map discrete spaces to continuous space, and use the optimization machinery we have on the continuous representation.