Generally the procedure is to guess & verify.
In this case your intuition is correct, Since
${\cal A} = \operatorname{co} \{ \pm e_k\}_{k=1}^3 $ (unit vectors) you
need only check that these points are indeed extreme points since all
other points cannot be extreme points.
To see why no other point can be an extreme point, suppose
$x \in \operatorname{co} \{b_k\}_k$ (with the $b_k$ being distinct
and finite in number) and
$x \notin \{ b_k \}_k $. Then $x = \sum_k \lambda_k b_k$ where
$\lambda_k \ge 0$, $\sum_k \lambda_k = 1$. Since $x \notin \{ b_k \}_k $, there must be at least one $\lambda_i \in (0,1)$. Then
$x = \lambda_i b_i + (1 -\lambda_i) {1 \over \sum_{k \neq i} \lambda_k }\sum_{k \neq i} \lambda_k b_k$ and since
$b_i, {1 \over \sum_{k \neq i} \lambda_k }\sum_{k \neq i} \lambda_k b_k \in \operatorname{co} \{b_k\}_k$ we see that $x$ cannot be an extreme point. Hence the extreme points must be a subset of the $b_k$.
(Aside: The above proof is not quite correct as it is possible that
$b_i = {1 \over \sum_{k \neq i} \lambda_k }\sum_{k \neq i} \lambda_k b_k$. If this is the case, then we can write
$x = {1 \over \sum_{k \neq i} \lambda_k }\sum_{k \neq i} \lambda_k b_k$ and repeat the process as necessary until $b_i \neq {1 \over \sum_{k \neq i} \lambda_k }\sum_{k \neq i} \lambda_k b_k$.)
To see why the $\pm e_k$ are extreme, we have the following useful
result:
Suppose $C$ is convex, $h$ some direction, and that $b \in C$ is the unique solution to the problem $\langle h, b \rangle = \sup_{c \in C} \langle h, c \rangle$. Then $b$ is an extreme point of $C$. (This is
straightforward to prove by contradiction.) Note that this is a
sufficient, not necessary requirement for $b$ to be extreme.
Proof: Suppose that $b \in C$ is the unique solution to the problem $\langle h, b \rangle = \sup_{c \in C} \langle h, c \rangle$
for some direction $h$, but that $b$ is not an extreme point. Then there are $y,z \in C$ distinct from $b$ and
$\alpha \in (0,1)$ such that $b= \alpha y + (1-\alpha)z$. Since
$\langle h, b \rangle = \sup_{c \in C} \langle h, c \rangle \ge \alpha \langle h, y \rangle + (1-\alpha) \langle h, z \rangle =
\langle h, b \rangle $, we see that
$\langle h, y \rangle = \langle h, z \rangle= \langle h, b \rangle$,
which contradicts the fact that $b$ is the unique minimiser.
If we choose $h = e_1$ then we see that
$\langle e_1, e_1 \rangle = 1 =\sup_{x \in {\cal A}} \langle e_1, x \rangle$, hence $e_1$ is extreme. The other points follow in a
similar manner.
I will use the usual inequalities, it should be clear what they mean. So let us look at your first question. You want to show that $x \in C$ iff $\exists y\geq 0:\mathbf{1}^Ty\leq1 \text{ and }x=v_0+By$. Let us take a $x \in C$. By definition there exists positive $\theta_i's$ that sum to unity, st. $x=\sum_{i=0}^k\theta_iv_i$. Notice that since $\sum_{i=0}^k \theta_i=1,$ we have that $\theta_0=1-\sum_{i=1}^k \theta_i$. With this in mind let us expand our expression for $x$:
$$x=\sum_{i=0}^k\theta_iv_i=\theta_0v_0+\theta_1v_1+\dots+\theta_kv_k=(1-\sum_{i=1}^k \theta_i)v_0+\theta_1v_1+\dots+\theta_kv_k$$
$$= v_0-\theta_1v_0-\theta_2v_0-\dots-\theta_kv_0+\theta_1v_1+\dots+\theta_kv_k$$
$$=v_0+\theta_1(v_1-v_0)+\theta_2(v_2-v_0)+\dots+\theta_k(v_k-v_0) $$
$$=v_0+B \begin{bmatrix}
\theta_1,&\cdots, &\theta_k
\end{bmatrix}^T$$
Notice that $\begin{bmatrix}
\theta_1,&\cdots, &\theta_k
\end{bmatrix}^T$ has non-negative entries since each $\theta_j$ was non-negative. and since the sum of all $\theta_j$ sum to unity, the lack of $\theta_0$ must mean that
$$
\mathbf{1}^T\begin{bmatrix}
\theta_1,&\cdots, &\theta_k
\end{bmatrix}^T \leq 1.
$$
The proof of the "if" statement can be done similarly, backwards.
For your second question you ask whether or not the columns of $B$ aren't clearly linearly dependant. Counter-examples to this are easy and you should find them yourself, but the independence of the vectors $\{v_1-v_0,v_2-v_0,\dots,v_k-v_0\}$ comes from requiring affine independence of $\{v_0,v_1,\dots,v_k\}$. You write this yourself before defining the convex hull of a set of vectors.
For your last question we have that $B \in \mathbb{R}^{n\times k}$ has rank $k$ by assumption. Notice $k\leq n$ since we can't have $k>n$ and still have them to be linearly independant. From your first course in Linear Algebra, you know that it is possible to reduce the matrix $B$ to be in reduced row echelon form, where the last $n-k$ rows are zero-rows since all your columns are linearly independent. Each time you perform a row-operation on a matrix, you actually do a right-multiplication with a $n \times n$-elementary matrix. Each elementary matrix is invertible. The product of all your elementary matrices - that correspond to all your row-operations, is your matrix $A$. The following notes gives a good example of using elementary matrices to construct such a matrix: https://people.math.carleton.ca/~kcheung/math/notes/MATH1107/wk05/05_elementary_matrices_example.html
Best Answer
So first of all we remember that the product of convex sets is convex too (see here for details) and moreover we remember that any interval of the real line $\Bbb R$ is convex: so any rectangle is convex being product of convex sets and thus the unit cube $[0,1]^k$ is convex. Now any linear map and any translation between t.s.v. preserve the convexity since if $f:V\rightarrow W$ is a such map then $$ f\big(x+t(y-x)\big)=f(x)+t\big(f(y)-f(x)\big) $$ for any $x,y\in V$. So we conclude that any $k$-parallelopiped is convex because it is homeomorphic to the unit cube $[0,1]^k$ via the composition of a translation and a linear map.
Anyway it is possible (if it interest) to prove the statement via other argumentations I show to follow.
So if $x,y\in\mathcal P_O(\vec v_1,\dots\vec v_k)$ then there must exist $\xi^i,\eta^i\in[0,1]$ for $i=1,\dots,k$ such that $$ x=O+\xi^i\vec v_i\,\,\,\text{and}\,\,\,y=O+\eta^i\vec v_i $$ and thus we have to prove that $$ x+t\cdot(x-y)=\big(O+\xi^i\vec v_i\big)+t\cdot(\eta^i-\xi^i)\vec v_i\in\mathcal P_O(\vec v_1,\dots,\vec v_k) $$ for any $t\in[0,1]$. So observing that $$ \big(O+\xi^i\vec v_i\big)+t\cdot(\eta^i-\xi^i)\vec v_i=O+\big(\xi^i+t\cdot(\eta^i-\xi^i)\big)\vec v_i=O+\big((1-t)\cdot\xi^i+t\cdot\eta^i\big)\vec v_i $$ for any $t\in[0,1]$ we observe that $$ \begin{cases}0\le\xi^i,\eta^i\le1\\0\le t\le1\end{cases}\Rightarrow\begin{cases}0\le\xi^i,\eta^i\le1\\0\le t\le 1\\0\le1-t\le1\end{cases}\Rightarrow\\ \begin{cases}(1-t)\cdot\xi^i\ge0\\t\cdot\eta^i\ge0\\t\cdot\eta^i\le t\le1\\(1-t)\cdot\xi^i\le(1-t)\le1\end{cases}\Rightarrow0\le(1-t)\cdot\xi^i+t\cdot\eta^i\le(1-t)+t=1 $$ for any $i=1,\dots, k$ and this proves the statement.