As others have pointed out, your statement of Zorn's lemma is ambiguous, and the one reasonable interpretation in English is not what is wanted (and in fact makes the statement false). This might be the source of your confusion. However, to address your questions at face value:
A maximal element of $P$ is an element such that no other is greater than it. That is, $x$ is maximal $p\not\geq x$ for any $p\in P$. Of course, because $P$ is not necessarily totally ordered, this is weaker than the notion of a maximum element, which demands that every other element is less than it.
Upper bounds require a bit more subtlety to define: Given a partially ordered set $P$ and a subset $S\subseteq P$, we say that $x$ is an upper bound of $S$ if every element of $S$ is less than $x$. Therefore, an upper bound is more similar to a maximum element than a maximal element. But it is importantly different because the upper bound does not need to belong to $S$ itself, just to $P$.
(A classic example for when this distinction is important is that open sets on the real line never have maximum elements, but they could have upper bounds.)
The sketch is somewhat formally problematic; it is intuitively appealing, but it would take a fair amount of formal machinery to actually get it work. That’s part of where “ordinals” come in.
To answer your questions: yes, that is the negation of Zorn’s Lemma. You can write Zorn’s Lemma as:
If
(for all $T\subseteq P$
(If $T$ is totally ordered,
then there exists $y_T\in P$ such that $x\leq y_T$ for all $x\in T$))
then
(there exists $m\in P$ such that
(for all $y\in P$
(if $m\leq y$
then $m=y$))).
The negation of “if $P$ then $Q$” is “$P$ and not($Q$)”. So the negation of Zorn’s Lemma would be
For all $T\subseteq P$
(if $T$ is totally ordered,
then there exists $y_T\in P$ such that $x\leq y_T$ for all $x\in T$)
and
(for all $m\in P$
(there exists $y\in P$ such that
$m\leq y$ and $m\neq y$)).
The last, “$m\leq y$ and $m\neq y$” is the same as $m<y$. So the latter clause reads: for every element of $P$ there is an element of $P$ that is strictly greater than it.
The idea of the proof is: since the empty set is a totally ordered set and has an upper bound, $P$ is not empty. Say $a_1$. But $a_1$ is not maximal, so there exists $a_2>a_1$. But $a_2$ is not maximal, so there exists $a_3>a_2$. “Continuing this way” (Choice, since it involves infinitely many choices) we get $a_1<a_2<a_3<\cdots$; let $T=\{a_1,a_2,\ldots\}$. This is totally ordered, so it has an upper bound $a_{\omega}$. This is not maximal, so there is $a_{\omega+1}>a_{\omega}$. “Continuing this way” (Choice) you keep going and going and going and going. Using some technical set theoretic machinery, you can then guarantee that you “keep going” past countability and really past any set size. So this would allow you find a collection of elements of $P$ that is strictly larger (in the sense of cardinality) than $P$ itself. This is of course impossible, leading to the contradiction.
The “ordinals” are types of well-ordered sets, and they are the generalization of natural numbers. They allow you to define “inductions” that go past the natural numbers and through to other infinite well-ordered sets as indices.
If you want a different version of the proof, I recommend George Bergman’s handout, The Axiom of Choice, Zorn’s Lemma, and all that, available from his website
Best Answer
The conforming subsets are unique for each order type, and their exact elements depends on the choice of the choice function.
The point of this definition is to hide the transfinite recursion, but it is exactly that. A transfinite recursion. It means that we start with $\varnothing$, which is a chain, and use the choice function to select an upper bound of this chain. Then we have some $\{x\}$, where $x$ was the upper bound chosen for $\varnothing$; so it is either that $x$ is maximal, or the chain has a strict upper bound, say $y$, so we have the chain $\{x,y\}$, etc.
Now you can still ask, why do we need the choice function? In general there is no unique choice of upper bounds. And we will generally need to make infinitely many of these choices.
When looking at this from the recursion point of view, the choice function is needed for the recursion process to be well-defined to begin with. Which is necessary for the conforming subset definition to be useful (i.e., to produce a chain, rather than just a subset).