I'm not sure it can be done.
First of all,
you have to look at
all $n$ items.
Then,
you have to keep the top
$\sqrt{n}$ items.
To find where an item
that is in the top $\sqrt{n}$ places,
you have to do
$O(\log(\sqrt{n}))
=O(\log(n))
$
comparisons.
If the items are
badly ordered,
you will have to do this
$n$ times.
This gives a worst case time
of $O(n \log n)$.
If the items are
will distributed,
the placing in the top $\sqrt{n}$
might only be done
$\sqrt{n}$ times.
In this case,
the placing would only take
$O(\sqrt{n}\log(n))$
operations,
so this would be
$O(n)$.
You can in the iterative manner, just substitute the recurrence expression back
So you have a basic $$T(n) = 2T\left(\frac{n}{2}\right) + cn$$
Therefore, for any $k$
$$T\left(\frac{n}{2^k}\right) = 2T\left(\frac{n}{2^{k+1}}\right) + c\cdot\frac{n}{2^k}$$
So,
$$T(n) = 2T\left(\frac{n}{2}\right) + cn =$$
$$ = 2\left(2T\left(\frac{n}{4}\right) + c\cdot\frac{n}{2}\right) + c\cdot n = $$
$$ 4T\left(\frac{n}{4}\right) + 2c\cdot n = \ldots = 2^kT\left(\frac{n}{2^k}\right) + kcn$$
Here, you should think "how deep you can go in the recursion?". Recall, in mergesort, the array of size $1$ is sorted and we stop diving.
I.e. $T\left(\frac{n}{2^{k_{\text{max}}}}\right) = T(1)$, then $k_{\text{max}} = \log n$.
This concludes in
$$T(n) = 2^{k_{\text{max}}}T(1) + k_{\text{max}}cn = nT(1) + cn\log n$$
$T(1)$ is some constant.
Regarding this $k$-split. No matter, how optimal you are making it, we know you can't theoretically drop under $\Theta(n \log n)$, why bother than?
Best Answer
The problem comes from merging the arrays.
What I think you're thinking when using a heap is to keep track of the smallest elements from each of the $k$-element sorted arrays, removing the smallest element $s$ them from the heap, putting $s$ in the output, and adding a new element onto the heap from the array which contained $s$.
The problem with this is that there are $n/k$ arrays, so your heap will actually contain $n/k$ elements.
As an extreme case, let $k=1$. Then your algorithm would consist of sorting each 1-element array (which takes no time) and then merging them using heapsort.