Imagine that the bees made the cells individually but with walls half as thick. They then pack these prefab cells together to make their honeycomb tiling. Clearly, the amount of wax per cell is constant here, so the efficiency of the tiling as a whole is then exactly the same as the efficiency of the individual tiles, regardless of how large it grows.
As hexagonal cells have a better area to perimeter ratio than squares or triangles, they are the best choice to build with for large numbers of cells.
Note however that to turn this into a sturdy honeycomb, they will still have to reinforce the outer boundary edges to a full wall thickness, and this loss is what counts. It shows that the efficiency of the honeycomb is best when the outer boundary is as small as possible.
Assuming they build the honeycomb out in two dimensions, the outer boundary grows linearly, whereas the total edge length (and number of cells) grows quadratically. So the relative loss shrinks as the number of cells grows. In fact, the efficiency approaches the theoretical limit (what it would be if they didn't need to reinforce the outer boundary).
Now the question is: If the bees have to stop building at some point, what is the best shape?
I.e. what shape has the shortest boundary relative to the number of cells?
If there were big dents in the outer boundary, i.e. two consecutive 240 degree angles, then you could add a cell without changing the boundary length. So we can assume that there are no such dents.
Similarly, you can also assume that there is no cell that has four boundary edges. If there were one, you could reattach it anywhere else without changing the boundary length, and in doing so create a dent. (I'm assuming there are enough cells so that there is some side of 3 or more cells to attach it to). Since we can rearrange and add cells without changing the total border length when there are cells with four sides on the border, we can assume there are no cells sticking out like that.
From this you will find that the only shapes left are essentially hexagonal. Not all hexagons are equally efficient however.
Suppose one side of the hexagon shape consisted of k cells, and a non-adjacent side had k+2 (or more) cells, then you could move one whole row of k cells from the first side and stick it onto the other side. This does not change the total boundary length. You have however created a dent that you can fill by adding a cell, so you can improve the efficiency.
Therefore the most efficient shapes are hexagons where the sides are almost of the same length (differ by at most 1 cell). These are local maxima, in that there is no way to rearrange and add cells without also increasing the boundary. Nevertheless, if you compare them, I think you'll find that the regular hexagon arrangement (i.e. all sides have exactly the same number of cells) will be the most efficient for its size.
MAJOR UPDATE:
What I am about to show is not a proof for the minimum number of rectangles. However, in some cases I found the number of rectangles can be less than $f(n)^2$. The smallest $N×N$ grid that I have found that can have less than $f(n)^2$ rectangles is $15×15$, which is displayed below:
$$\begin{array}{|c|c|c|c|c|c|c|c|c|c|c|c|c|c|c|} \hline
1&1&1&1&2&2&3&4&4&4&4&4&4&4&4\\ \hline
1&1&1&1&2&2&3&4&4&4&4&4&4&4&4\\ \hline
1&1&1&1&2&2&3&4&4&4&4&4&4&4&4\\ \hline
1&1&1&1&2&2&3&4&4&4&4&4&4&4&4\\ \hline
1&1&1&1&2&2&3&5&5&5&5&5&5&5&5\\ \hline
1&1&1&1&2&2&3&5&5&5&5&5&5&5&5\\ \hline
1&1&1&1&2&2&3&6&6&6&6&6&6&6&6\\ \hline
1&1&1&1&2&2&3&7&8&9&9&10&10&10&10\\ \hline
11&11&11&11&11&11&11&11&8&9&9&10&10&10&10\\ \hline
12&12&12&12&12&12&12&12&8&9&9&10&10&10&10\\ \hline
12&12&12&12&12&12&12&12&8&9&9&10&10&10&10\\ \hline
13&13&13&13&13&13&13&13&8&9&9&10&10&10&10\\ \hline
13&13&13&13&13&13&13&13&8&9&9&10&10&10&10\\ \hline
13&13&13&13&13&13&13&13&8&9&9&10&10&10&10\\ \hline
13&13&13&13&13&13&13&13&8&9&9&10&10&10&10\\ \hline
\end{array}$$
The method used in the above $15×15$ square can be generalized not just to other squares but to rectangles as well. In order to make full use of this method, I will expand the op's method to rectangles. Let the length of a rectangle be equal to $m$ units and the width be $n$ units. Then the number of base-2 rectangles used to cover a $m × n$ rectangle by the op's method is $f(m)f(n)$. My method isn't fundamentally different from the op's method. It splits the $m×n$ rectangle into five sub-rectangles, then the op's method is applied to each of the five rectangles. In some cases the number of base-2 rectangles that covers the five sub-rectangles is less than the number of base-2 rectangles that cover original $m$×$n$ rectangle using the op's method. The five rectangles are arranged so that their are two pairs of rectangles that occupy the corners and one rectangle that is in the middle (not touching the perimeter). Each pair of rectangles are the same size and orientation but in opposite corners. (Top left and bottom right or Top right and bottom left.) The length and width of the five rectangles are constructed from two other unit lengths $a$ and $b$. $a$ is the smallest number such that $m+a$ is a power of two. $b$ is the smallest number such that $n+b$ is a power of two. This means that $f(m+a)$ and $f(n+b)$ are each one. The length and width of the two rectangles in the first pair are $f\left(\frac{m+a}{2}\right)$ and $f\left(\frac{n-b}{2}\right)$ respectively. The length and width of the two rectangles in the second pair are $f\left(\frac{m-a}{2}\right)$ and $f\left(\frac{n+b}{2}\right)$ respectively. The formula for the total number of base-2 rectangle used is $2f\left(\frac{m+a}{2}\right)
f\left(\frac{n-b}{2}\right)+2f\left(\frac{m-a}{2}\right) f\left(\frac{n+b}{2}\right)+f(a)f(b)$.
Note that if a square with a length of $n$ units is of the form $2^xy$ where $x,y\in\Bbb{N}|x\ge 1,y\ge 1$ and $y$ is odd. Then the number of base-2 rectangles used for both the op's method and my method are the the same as the number of base-2 rectangles used for a square of length $y$ because each of the dimensions of the sub-rectangles can be multiplied by $2^x$. For example if we want to determine how many base-2 rectangles is rectangles are required to cover a $30×30$ square using my method. We just use the $15×15$ example near the top of this post and multiply the length and width of each base-2 rectangle by $2$. So this means the $30×30$ square requires the same number of base-2 rectangles as the $15×15$ square. So the problem can be simplified to just rectangles where $m$ and $n$ are odd.
A simple inequality can be made which would indicate which method uses less base-2 rectangles. Let $N_l$ be the number of ones in the number for length of the rectangle in binary and $N_w$ be the number of ones in the width in binary $\bigl($or more simply $N_l=f(m)$ and $N_w=f(n)\bigr)$. Also Let $Z_l$ be the number of zeros in the number for length of the rectangle in binary, $Z_w$ be the number of zeros in the width in binary.
My method uses less rectangles than the op when $$2f\left(\frac{m+a}{2}\right) f\left(\frac{n-b}{2}\right)+2f\left(\frac{m-a}{2}\right) f\left(\frac{n+b}{2}\right)+f(a)f(b)\lt f(m)f(n)$$
$$f\left(\frac{m+a}{2}\right)=1$$
$$f\left(\frac{n+b}{2}\right)=1$$
$$f\left(\frac{m-a}{2}\right)=N_l-1$$
$$f\left(\frac{n-b}{2}\right)=N_w-1$$
$$f(m)=N_l$$
$$f(n)=N_w$$
$$f(a)=Z_l+1$$
$$f(b)=Z_w+1$$
With the above substitutions the inequality can be changed to:
$$2(N_l-1)+2(N_w-1)+(Z_l+1)(Z_w+1)\lt N_lN_w$$
$$2N_l+2N_w-4+(Z_l+1)(Z_w+1)\lt N_lN_w$$
$$(Z_l+1)(Z_w+1)\lt N_lN_w-2N_l-2N_w+4$$
$$(Z_l+1)(Z_w+1)\lt (N_l-2)(N_w-2)$$
In the specific case of the square (where the length equals the width) my method uses less base-2 rectangles than the op when the number ones in the binary representation of the length is at least four more than than the number of zeros. To get the maximum utility out of my method the inequality shouldn't only be applied to the entire length and width of the main square it should also be applied to components of the square. For example consider the square $1927×1927$. The binary representation of 1927 is 11110000111. There are three more ones than zeros in this number so my method would normally break even with the op, covering the square with 49 base-2 rectangles. There is a way to cover the square using less base-2 rectangles by spliting the square into four rectangles $1920×1920$, $1920×7$, $7×1920$, and $7×7$. Splitting this way doesn't change the net result of the op's method. The first three sub rectangles satisfies the inequality. If I use my method on the first three sub rectangles I use 13, 11, and 11 base-2 rectangles respectively. Using the op's method on the last sub rectangle then counting up all of the base-2 rectangles I can cover the $1927×1927$ square using 44 base-2 rectangles.(13+11+11+9) So if a combination of sub-strings in the binary value of the length and width satisfies the inequality like it did three times with the sub rectangles then my method will use less base-2 rectangles than the op's method. Finding the minimum number of base-2 rectangles for some squares will inevtably involve searching for the best way to split the square.
For large enough squares the worst digit combination where my method does no better than the op is a block of three ones and the rest are alternating zeros and ones.
For example the square $\require{enclose}\enclose{horizontalstrike}{343×343}$, its binary representation is 101010111. This square requires 36 base-2 rectangles and is tied for most number of required base-2 rectangles amoung the nine digit squares. This means that a upper bound can be made for the minimum number of rectangles required. Let $\enclose{horizontalstrike}{d_l}$ be the number of digits in the binary representation of the length of the rectangle. ($\enclose{horizontalstrike}{d_l=N_l+Z_l}$) Let $\enclose{horizontalstrike}{d_w}$ be the number of digits in the binary representation of the width of the rectangle. ($\enclose{horizontalstrike}{d_w=N_w+Z_w}$) Then the upper bound is: $$\enclose{horizontalstrike}{\left(\left\lceil\frac{d_l}{2}\right\rceil+1\right)\left(\left\lceil\frac{d_w}{2}\right\rceil+1\right)}$$
I conjecture that the combination of my method and the op's method is the optimal way of minimizing the number of base-2 rectangles. The only way that someone might use be able to use less rectangles is to find a another way of spliting the square into sub-rectangles such that using the op's method on those sub-rectangles uses less base-2 rectangles than using my method and the op's method on the whole square.
Rob Pratt's(RP's) post shows that there is a third method for covering the $n×n$ square with less base-2 rectangles than my method or the op's method for some $n×n$ squares. In order to describe how many rectangles RP's method uses I will continue to use the the term $b$ from my method (where $b$ is the smallest number such that $b+n$ is a power of 2). I will also need a new sets of terms $c_k$ and $s_k$ where $k\in\Bbb{N}|1\le k\le f(b)$. $c_1$ is the value of left most ones digit of b in binary form. $c_2$ is the value of the second ones digit from the left of b in binary form. $c_3$ is the value of the third ones digit from the left of b in binary form. Etc. $s_v=\sum_{j=1}^vc_v$. For example if $n=23$ then $b=9$, $c_1=8$, $c_2=1$, $s_1=8$, $s_2=9$. RP's method uses $$2f\left(\frac{n+b}{2}\right)f\left(\frac{n-b}{2}\right)+f\left(\frac{n+b}{2}\right)f\left(\frac{n-b}{2}+s_k\right)+f\left(\frac{n-b}{2}\right)f\left(\frac{n+b}{2}-s_k\right)+f(b)f(b-s_k)$$
base-2 rectangles. Each $f(•)f(•)$ product contains the length and width of each of the sub-rectangles that covers the square inside the f function. RP's method has $k$ ways of covering the $n×n$ square one for each $s$ element. Obviously the particular $s_k$ element that uses the least number of base-2 rectangles according to the above formula is the one that is used for the minimum. A sufficient condition for when RP's method uses less base-2 rectangles than both my method and the op's method when the binary representation of $n$ has at least three more ones than zeros, the second digit to the left is a zero, and the spliting method that was mentioned for the $1927×1927$ square doesn't apply.
Best Answer
Given a solution with $m$ rectangles, we can turn it into a solution with $m+3$ rectangles by subdividing one of the rectangles into four smaller copies of half the side length. We can also do $m+4$ via the method you described. Here is a construction that produces $m+5$:
With these extensions and the base case $n=3$, we can get every integer except $n=1,2,4,5$; I haven't gone through a careful proof that all of these are impossible, but it's not too hard for $4$ and I suspect $5$ is doable with a bit of casework.
The general problem with $1\times k$ rectangles will have the trivial $m=k$ solution and the ability to extend solutions from $m$ to $m+4$ and $m+n^2-1$ for all $n$; this means that there will always be solutions past $k+5$. I expect each of $k+1,k+2,k+5$ could be tackled in generality, thereby solving the problem in all cases, but it might be fairly annoying to handle everything that comes up (especially in the $k+5$ case).