Here's a geometrical reformulation of the problem that yields an
$O(n \log n)$ solution for $m=2$ and suggests a context that
may yield good answers for arbitrary fixed $m$.
[EDIT but see below for $m=3$ and $m \geq 4$.]
Denote the $i$-th coordinate of $u_k^{\phantom.}$ and $v_k^{\phantom.}$ by
$u_k^{(i)}$ and $v_k^{(i)}$ respectively.
Consider the $n$ vectors in $V := {\bf R}^m$ given by
$$
x_i = (u_1^{(i)}, u_2^{(i)}, u_3^{(i)}, \ldots, u_n^{(i)})
$$
($i=1,2,\ldots,n$), and the $n$ dual vectors
$$
y_j = (v_1^{(j)}, v_2^{(j)}, v_3^{(j)}, \ldots, v_n^{(j)}) \in V^*.
$$
($j=1,2,\ldots,n$). Then the $(i,j)$ entry of $A$ is $y_j(x_i)$.
Thus the problem asks for the minimum or maximum of $y_j(x_i)$
as $i,j$ range independently over $\lbrace 1, 2, \dots, n \rbrace$.
Note that given $A$ there are many choices of $u_k$ and $v_k$, but
the choice is tantamount to a choice of basis on $V$ and of dual
basis on $V^*$, so geometrically our $y_j(x_i)$ problem depends only
on $A$.
Now it's clear that the minimizing and maximizing $x_i, y_j$
must be vertices of the convex hull of $\lbrace x_i \rbrace$ and
$\lbrace y_j \rbrace$ respectively. This recovers the known solution
for $m=1$, when any bounded convex subset of $V$ has (at most) two
vertices, which can be found in $O(n)$ comparisons.
For $m=2$, it is still known how to find the vertices of the convex hull
(in cyclic order) in $O(n \log n)$ steps [see for instance the
"Convex hull algorithms" Wikipage for references]. Once we know
the vertices of the convex hull of the $x_i$, we can for each $j$
find the minimal and maximal $y_j(x_i)$ in $O(\log n)$ steps by bisection,
making the overall computational cost still $O(n \log n)$.
Finding the convex hull and its structure for $m=3$, and larger fixed $m$,
is harder, but at least there's some literature on this problem, and
experts who can suggest good ways to proceed.
EDIT Alas the current state of the art for computing convex hulls
apparently limits this approach to $m=2$ and maybe $m=3$.
The Mathworld entry for
"Convex Hull"
reports on an $O(n \log n)$ algorithm also in dimension 3, referring to
Skiena, S. S. "Convex Hull." $\S$8.6.2 in The Algorithm Design Manual. New
York: Springer-Verlag, pp. 351–354, 1997.
But this still leaves open the question of computing in time $\tilde O(n)$
a data structure that will find the extreme point in each direction in
only $n^{o(1)}$ operations, or even in significantly less than
the number of vertices of the convex hull. Still, if at least
one of $\lbrace x_i \rbrace$ and $\lbrace y_j \rbrace$ has a convex hull
with significantly fewer than $n$ vertices then this will improve on the
exhaustive search over $n^2$ entries of the matrix.
Once $m \geq 4$ it seems that it is not even known how to compute
the set of vertices of the convex hull in as few as $n^2$ operations;
unless the number of operations can be brought significantly below $n^2$,
this means that the technique described here is limited to $m=2$ and maybe $m=3$.
TIDE
All of this analysis assumes that we don't run into difficulties like
$m=1$, $u_1 = (-1,2.54)$, $v_1 = (1,-.3937)$, where there are
two or more very close candidates for the minimum. To deal with that,
we might assume that we can do exact arithmetic (perhaps the coordinates
are quantized with fixed denominator), or tolerate an error that can be
brought below $\epsilon$ in $O(\log(1/\epsilon))$ steps.
First, a meta-comment. Numerical linear algebra is de facto a part of engineering or, more precisely, computer science (practical sciences studying how we do something and dealing with our practical relationships to mathematical objects and not the primary objects themselves) so servers about computer science similar to this one may be more helpful in solving such questions.
Now, my guess. The problem to find eigenstates of a Toeplitz matrix is not exactly solvable in the same sense as in the special case of the circulant which makes me expect that the solutions to this problem won't be substantially faster than the solution of the problem for general matrices.
But if you can find the eigenvalues by the method you mentioned, I believe that the problem to find eigenstates is a special case of a Toeplitz system,
https://en.wikipedia.org/wiki/Toeplitz_matrix#Solving_a_Toeplitz_system
although a singular one, and there exists an $O(n^2)$ algorithm for that, the Levinson algorithm
https://en.wikipedia.org/wiki/Levinson_recursion
The Toeplitz system is the problem to find $x$ obeying $Ax=b$ for a given Toeplitz matrix $A$ and a column $b$. If you know an eigenvalue $\lambda$ and you want to find the eigenvector $x$, you are solving $(A-\lambda\cdot {\bf 1})x=0$ which seems like a Toeplitz system, one with a shifted (still Toeplitz) new matrix $A-\lambda\cdot {\bf 1}$, albeit a system with $b=0$. Of course, the solution is not unique – the scaling is arbitrary – and my experience is non-existent and preventing me from saying whether the method breaks down for $b=0$ or $b\to 0$, which you may try if $b=0$ doesn't work, but it seems as an obvious attempt constructed from the pieces that are out there.
Best Answer
Sarlos and Clarkson and Woodruf give efficient streaming algorithms for the more general problem of computing an approximate singular value decomposition of an arbitrary matrix. Because of the sublinear space streaming constraints, they look at "tall" matrices: much fewer columns than rows. But the algorithm should work and give efficiency gains for square matrices as well. The idea is to project the columns onto a much smaller dimension, similarly to the proof of the Johnson-Lindenstraus lemma, and then work with this smaller matrix.