You are having confusion because you aren't being clear about what you are taking the gradient of. If $f(x,y)$ is a function of two variables, then the gradient $\nabla f(x,y)$ is a vector in the plane which points in the direction of greatest rate of change of $f(x,y)$ at each point. Moreover, it is normal to the level curves of $f(x,y)$ (these are the equations $f(x,y)=k$ for various real numbers $k$). This makes sense because traveling along the level curves of $f(x,y)$ results in zero change.
Note that the graph of $f(x,y)$ and the level curves are very different. The graph of $f(x,y)$ is the surface $f(x,y)=z$ (here z is a dependent variable, not a constant!). We can find a normal vector to the graph of $f(x,y)$ by letting $g(x,y,z)=z-f(x,y)$ and seeing that the graph of $f(x,y)$ is a level surface to $g(x,y,z)$ (where $k=0$). Then for each $(x,y,z)$ such that $g(x,y,z)=0$ we have that $\nabla g(x,y,z)$ is normal to the level surface $g(x,y,z)=0$, which as mentioned, is the graph of $f(x,y)$. Of course $\nabla g(x,y,z)$ and $\nabla f(x,y)$ are different - they don't even have the same number of components.
We can first ask what is meant by the direction of "steepest ascent" for a function $f(\vec{x})$ around $\vec{x}_0$? What we usually mean by this is the direction given by a vector $\vec{\delta}$ so that a minute (infinitesimal step) in that direction gives the greatest increase of the function while keeping the length of the minute step fixed. You can think of the following process as one that converges to finding the direction of steepest ascent:
First start with all unit vectors $\vec{\delta}$, and ask: of all the possible $\vec{\delta}$, which one maximizes $f(\vec{x}_0 + \vec{\delta})$? Let the answer be $\vec{\delta}_1$.
Then reduce the length of $\vec{\delta}$ to say, $0.1$ and ask the same question: of all such possible $\vec{\delta}$, which one maximizes $f(\vec{x}_0 + \vec{\delta})$? Let the answer be $\vec{\delta}_1$.
Continue to reduce the length of $\vec{\delta}$ to be fixed at some number arbitrarily close (but not equal) to zero, and ask the same question, thereby constructing an infinite sequence $$\vec{\delta}_1, \vec{\delta}_2, \vec{\delta}_3, \cdots$$ For differentiable functions, the direction each $\vec{\delta}_i$ as $i \to \infty$ points in will approach a well-defined limit!
The limit of this infinite process can then be thought of the direction of steepest ascent. It maximizes the increase of the function per unit distance traveled around the point $\vec{x}_0$, in the limit where the distance traveled from $\vec{x}_0$ is very small.
To find exactly what direction $\vec{\delta}$ approaches, we can use the idea behind defining the derivative, which is to provide a linear approximation of a function around a point. The idea is that for a multivariate function $f(\vec{x})$ that has a gradient $\nabla f$, moving an infinitesimal amount $\vec{\delta}$ from $\vec{x}_0$ to $\vec{x}_0 + \vec{\delta}$ changes the value of the function from $f(\vec{x}_0)$ to $f(\vec{x}_0) + \nabla f(\vec{x}_0) \cdot \vec{\delta}$. Even though there will always be corrections on the order of $\vec{\delta}^2$ and beyond, they are negligible in the limit as $\delta \to \vec{0}$. That's because every differentiable function can be approximated as linear in the neighborhood of any arbitrary point in its domain.
The problem then becomes to find $\vec{\delta}$ such that $\nabla f(\vec{x}_0) \cdot \vec{\delta}$ is maximized (since this is the increase $f(\vec{x}_0 + \vec{\delta}) - f(\vec{x}_0)$ that is recorded. But remember there's a caveat: the length of all $\vec{\delta}$ must be held fixed. Because of this, finding the vector $\vec{\delta}$ that maximizes $\nabla f(\vec{x}_0) \cdot \vec{\delta}$ is equivalent to finding the unit vector $\hat{\delta}$ that maximizes $\nabla f(\vec{x}_0) \cdot \hat{\delta}$. From there you can check that choosing $\hat{\delta}$ to be the unit vector in the direction of $\nabla f (\vec{x}_0)$ maximizes this quantity.
Best Answer
$\vec{k} \times \vec{n}$ is a vector that sits in both the $x$-$y$ plane and $\mathscr{P}$, because it is normal to the normal of both planes. This means that its angle to the $x$-$y$ is always $0$, so this doesn't work.
In your drawing, it's not clear how $\vec{w}$ is defined, as it depends on how you crop the plane into the picture. Drawing both planes, their intersections and the vectors $\vec{k}$ and $\vec{n}$ coming out of the origin probably helps to visualise.
Intuitively you want a vector that's in $\mathscr{P}$ and points as 'far away' from the $x$-$y$ plane as possible, and this is achieved by being normal to the line where the $x$-$y$ plane and $\mathscr{P}$ intersect. The vector $\vec{k} \times \vec{n}$ runs along that line. So $\vec{w}$ needs to be in $\mathscr{P}$ (i.e. it is normal to $\vec{n}$) and normal to $\vec{k} \times \vec{n}$, and so $\vec{w}=\vec{n} \times (\vec{k} \times \vec{n})$ does the job.