I think that "batch gradient descent" is just another name for "gradient descent". And "mini-batch gradient descent" is just another name for the mini-batch version of stochastic gradient descent (which I would call mini-batch SGD).
So below I'll explain the meaning of gradient descent, stochastic gradient descent (with a mini-batch size of $1$), and mini-batch stochastic gradient descent.
Suppose we're minimizing
$$
\tag{1} f(x) = \frac{1}{N} \sum_{i=1}^N f_i(x).
$$
Let's assume the functions $f_i: \mathbb R^n \to \mathbb R$ are differentiable.
The gradient descent iteration is
$$
x^{k+1} = x^k - t \nabla f(x^k).
$$
Here $t$ is the step size, also known as the learning rate, for our optimization algorithm.
In stochastic gradient descent (with a mini-batch size of $1$), each time we update $x$ we compute the gradient using only one of the terms, selected at random, from the big sum (1). So in SGD we update $x$ as follows:
$$
x^{k+1} = x^k - t \nabla f_i(x^k),
$$
where $i$ is selected at random from $\{1,2, \ldots, N \}$.
The index $i$ can be selected either with replacement or without replacement. I believe without replacement is more common and tends to work a little better.
In the mini-batch version of stochastic gradient descent, with a mini-batch size of $M$, each time we update $x$ we compute the gradient using only $M$ of the terms, selected at random, from the big sum (1).
So the minibatch SGD update is
$$
x^{k+1} = x^k - t \left( \frac{1}{M} \sum_{i \in S_k} \nabla f_i(x^k) \right),
$$
where $S_k$ is a randomly selected $M$-element subset of $\{1, 2, \ldots, N \}$.
Usually the subset $S_k$ is not allowed to intersect with any of the previous subsets $S_1, \ldots S_{k-1}$ until we have completely exhausted the set of possible indices $S = \{1,2, \ldots, N \}$, at which point we make another full sweep through $S$, then another full sweep through $S$, and so on. Each full sweep through $S$ is called one "epoch".
When using stochastic gradient descent, often people use a diminishing step size strategy in order to guarantee convergence. SGD with a fixed step size does not converge, although in practice it might find a good approximate minimizer of (1).
Best Answer
I think it is neither. That the objective function is differentiable means that it has a derivative. This allows you to use methods like (2) in your question to approximate it in a numerical scheme, or (1) to work with it analytically. It is a restriction, which allows to use many helpful results about the location of the extrema.
That it is differentiable, does not mean you can find out what the derivative is.
BTW, optimization is not limited to differentiable functions all the time :-).