It really depends on the function. For some functions you can compute the derivative to obtain the local minima, e.g. $\sin x$ is non-convex but it is straightforward to find all its minima ($k\pi, k\in \mathbb{Z}$).
Gradient descent also works well on non-convex functions, but it will find a local minimum. There are techniques to "push" the gradient descent out of a local minimum hoping it will find a global minimum or a better local minimum.
For example, stochastic gradient descent calculates the gradient based on only one sample at a time. This not only speeds up computation of the gradient(because we only use one sample), it can also help to avoid getting stuck in a local minimum. This happens because single samples tend to be noisy, and that noise can be enough to push towards a better solution.
There are also second-order methods like the Newton's method, which uses the second derivative to optimize the step direction.
Two other widely used methods are the BFGS method and conjugate gradient descent.
This also depends on what exactly you are trying to achieve. If you are training a neural network, there are more specific techniques you can use. For example you can train multiple neural networks with different starting values. This will lead to different local minima during training and as a result you can take an ensemble of your different neural networks.
There are a ton of other methods you can use to optimize a non-convex function and it would be quite difficult(if at all possible) to include all of them in a post. There is also quite a lot of research being done in this area.
To answer your comment on why would you use gradient descent instead of just finding the local minima analytically, two main reasons come to mind: a) if an analytical solution exists, it might be computationally more difficult to compute than gradient descent. After you derive the formula for the gradient, you still need to set it to 0 and then solve it with respect to the parameters you are trying to optimize. That might involve calculating the inverse of some matrix. This is not necessarily true, if all you want is to calculate the gradient so that you can use it in gradient descent. b)It is possible that an analytical solution does not exist or it is too complicated. You can still approximate the gradient using numerical methods which you can then use in gradient descent.
Yes, there are situations where regularization can speed up convergence of gradient descent. But, if you add a regularization term to the objective function, you will no longer be optimizing the same function, and the solution will differ. This is an intended effect when regularization is used to impose inductive bias in a learning problem (e.g. when seeking to avoid overfitting). But, this may be undesirable if your goal is simply to speed up convergence in an optimization problem. In that case, you'd be better off addressing the problem by changes to the optimization algorithm rather than the objective function. That's outside the scope of this question, but briefly, take a look at Newton, quasi-Newton, and conjugate gradient methods. Also note that preconditioning can speed up optimization for same reasons described below for regularization.
To answer the original question...
How regularization can speed up convergence
One situation where regularization can speed up gradient descent occurs when the objective function contains long, narrow valleys whose walls are steeply sloped, but whose floor slopes only gradually toward the minimum. In this case, the steepest descent direction at most locations points toward the valley floor. Gradient descent will make only slow progress toward the minimum (due to the shallow slope in that direction), and might oscillate back and forth between the walls (due to their steepness). See the left example plot below.
Some forms of regularization change the objective function to be more bowl-shaped. In this case, steepest descent directions point more toward the minimum, and the slope along valley floor is increased. Here, gradient descent can make faster progress toward the minimum. See the right example plot below.
Example
Here's a simple example: solving for linear regression coefficients in 2 dimensions. $X$ contains Gaussian inputs with highly correlated columns. $y$ contains targets, generated as a linear function of the inputs, plus i.i.d. Gaussian noise.
Ordinary least squares (unregularized) is used in the left plot:
$$\min_{w} \ \|y - X w\|^2$$
Ridge regression ($\ell_2$ regularization) is used in the right plot:
$$\min_{w} \ \|y - X w\|^2 + \lambda \|w\|^2$$
The gradient descent path (starting from the same initial point) is shown superimposed on the objective function:
Highly correlated inputs make the ordinary least squares objective function highly elongated. $\ell_2$ regularization makes the objective function more bowl-shaped, and gradient descent requires fewer iterations to converge (17 vs. 100). Note that the solution is different as a consequence of the regularization term.
Best Answer
The function you have graphed is indeed not convex. However, it is quasiconvex.
Gradient descent is a generic method for continuous optimization, so it can be, and is very commonly, applied to nonconvex functions. With a smooth function and a reasonably selected step size, it will generate a sequence of points $x_1, x_2, \ldots$ with strictly decreasing values $f(x_1) > f(x_2) > \ldots$.
Gradient descent will eventually converge to a stationary point of the function, regardless of convexity. If the function is convex, this will be a global minimum, but if not, it could be a local minimum or even a saddle point.
Quasiconvex functions are an interesting case. Any local minimum of a quasiconvex function is also a global minimum, but quasiconvex functions can also have stationary points that are not local minima (take $f(x) = x^3$ for example). So it's theoretically possible for gradient descent to get stuck on such a stationary point and not progress to a global min. In your example, if the shoulder on the left side of the graph were to perfectly level out, gradient descent could get stuck there. However, variants such as the heavy-ball method might be able to "roll through" and reach the global min.