There is a pretty short proof (usually called the martingale proof), once you established some major theorems. In particular, we assume that we know the Fundamental theorem of asset pricing and some properties of brownian motions.
So, BS start assuming that the stock dynamic is
$$\frac{dS_t}{S_t} = \mu dt + \sigma dB_t$$
Using Ito's lemma, you can show that the solution to this SDE is
$$S_t = S_0e^{(\mu - \sigma^2/2)t + \sigma B_t}$$
i.e. $S_t$ follows the so called geometric brownian motion.
Now, we show that the market defined by $S_t$ and the risk free rate is complete (intuitively, there is only one source of randomness, in this case the brownian motion $B_t$). If the market is complete, it means that everything can be prices by discounting the expectation of the payoff under the risk neutral measure; for the case of a call option, this is
$$V(t, T) = e^{-r(T-t)} E_{Q}[(S_T - K)^+ \mid \mathcal F_t]$$
Note that we are using the risk netrual measure $Q$ instead of $P$, the "real world" probability measure. This means that we don't know (yet) the dynamic of $S_t$ under the $Q$ measure.
How to find the dynamics under $Q$? Well, we start from the observation that under $Q$, the drift of the stock price $S_t$ must be equal to the risk free rate $r$. We will also use Girsanov's theorem, that says that if
$$\frac{dQ}{dP}\mid _{\mathcal F_t} = \mathcal E(B_t) = \exp\left(\alpha B_t - \frac 12 \alpha^2t\right)$$
then $dW_t = dB_t - \alpha dt$ is also a brownian motion. Substututing in the initial SDE and imposing that the drift is $r$, we get $\alpha = \frac{r - \mu}{\sigma}$ and the dynamics of $S_t$ under $Q$ are
$$S_t = S_0 e^{(r - \sigma^2/2)t + \sigma W_t}$$
Notice how $\mu$ is not a part of the equation anymore! And in fact the price of an option is independent on the drift of the stock under the real-world measure $P$, pretty counter-intuitive at first.
Now, it's almost done! We just need to compute an expectation ;)
$$V(t,T) = e^{-r(T-t)} E_{Q}[(S_T - K)^+ \mid \mathcal F_t]$$
and we know the dynamics of $S_T$ under $Q$. If for simplicity we take $t = 0$ (otherwise we need to condition use the fact that $S_t$ is known when we condition wrt to $\mathcal F_t$.. nothing really changes). We just need to compute
$$\begin{align} V(0,T) = e^{-rT} E_{Q}[(S_T - K)1_{S_T > K}] &= \\ e^{-rT} E_{Q}[S_T1_{S_T > K} - K1_{S_T > K}] &= \\ e^{-rT}E_{Q}[S_T1_{S_T > K}] - e^{-rT}KQ(S_T > K) \end{align} $$
Now $Q(S_T > K)$ is trivial to compute;
$$Q(W_T > \log \frac K{S_0} - \frac{r - \sigma^2}{2} T) = N(d_2)$$
because $W_T$ (Under Q) is normally distributed with variance $T$.
For the first term there is a similar trick but I don't remember at the moment.. But you get the idea, now you just have a function of a normal random variable (i.e. $S_T$) and finding the expectation should be easy
I asked my lecturer this question, and he gave me the following explanation, which I think makes sense:
Basically, assets can either have forward price or spot price. Therefore, there are two definitions for ATM options. If we use the spot price of a stock, then we say an option is at-the-money when the strike price($K$) is equal to the spot price($S_t$); If we use the forward price of a stock, then an option is ATM when the strike price equals to the forward stock price, which is $S_te^{(r-q)(T-t)}$ in this case.
Now if we arrange $(\ast)$, we immediately get:
$$K=S_te^{(r-q)(T-t)}e^{{\sigma^2 \over 2}(T-t)}$$
where $S_te^{(r-q)(T-t)}$ is the forward price of stock.
Now, let's consider the real market. Since there is only a fixed number of strike prices in the market while the forward stock prices change continuously, it is rare that the strike price is the same as the forward stock price. Therefore, we can relax the ATM forward definition above to reflect this reality.
Relaxed definition: In the case of the forward price of stock, an option is at-the-money if the strike price $K\in(S_te^{(r-q)(T-t)}-\epsilon, S_te^{(r-q)(T-t)}+\epsilon)$.
Therefore, given that $e^{{\sigma^2 \over 2}(T-t)}$ is relatively small because of a small $\sigma$, it is reasonable to assume that $$K=S_te^{(r-q)(T-t)}e^{{\sigma^2 \over 2}(T-t)}\in(S_te^{(r-q)(T-t)}-\epsilon, S_te^{(r-q)(T-t)}+\epsilon)$$
which confirms the statement that the $Vega$ of the Black Scholes Model is at its maximum for ATM options.
In summary, this question is answered base on the ATM defined on the forward stock price, and the relaxed definition of ATM forward that is consistent with reality.
I hope this explanation given by my lecturer is helpful for those who have the same question as mine. If you have different ideas, please leave a comment!
Best Answer
In the Black-Scholes model, the underlying price $S_0$ is positive.
Then for $T>0$, "vega", the partial derivative of the option price with respect to volatility is positive $$\frac{\partial X}{\partial \sigma} = S_0N'(d_1)\sqrt{T}=\frac1{\sqrt{2\pi}}\exp(-d_1^2/2)S_0\sqrt{T}>0$$