For completeness, I answer my own question. Since I am still a beginner, I will write in full details. There is no original insights in the following. In fact the original questions/confusions came from not fully understanding the notions/constructions. And such confusions are resolved once I make clear of the statements/definitions. Any criticisms/clarifications are welcomed.
Let $M$ be a complex manifold, let $L$ be a line bundle on $M$, equipped with the projection $\pi:L\to M$.
The transition functions arise in the following way: Let $\{U_i\} _I$ be an open cover of $M$ so that $U_i$ are local trivialization of $L$, meaning that there is a diffeomorphism $\varphi_i:\pi^{-1}(U_i)\to U_i\times\mathbb C$ so that $\varphi_i(\eta) =(\pi(\eta), \hat\varphi_i(\eta))$, where $\hat\varphi_i$ are linear isomophism on each fiber. Then $\varphi_j\circ\varphi_k^{-1}(x,\lambda)=(x, \hat\varphi_i\circ\hat\varphi_j^{-1}(\lambda)) $, whenever defined. And we have for each fixed $x\in U_j\cap U_k$, $\lambda \mapsto \hat\varphi_j\circ\hat\varphi_k^{-1}(\lambda)$ is a linear isomorphism and is given by a constant depending on $x$. This gives rise to the transition functions satisfying $(x, g_{U_jU_k}(x)\lambda)= \varphi_j\circ\varphi_k^{-1}(x,\lambda) $.
Conversely, suppose $\{U_i\}_I$ is an open cover, given a family of transition functions $g_{UV}$ for each pair of open sets $U, V$, satisfying $g_{UV}g_{VW} =g_{UW}$ whenever defined, note that this implies that $g_{UU}\equiv 1$ for any $U$ and $g_{UV} =\frac1{g_{VU} } $. We may define the line bundle $L=\bigsqcup_I U_i\times\mathbb C/\sim$, where $(x_j, \lambda_j) \sim (x_k, \lambda_k) $ iff $x_j=x_k$ and $g_{U_jU_k}(x_k)\lambda_k=\lambda_j$. Denote the equivalent class of $(x, \lambda) $ by $[(x, \lambda)]$, the projection map is given by $\pi([(x, \lambda)]) =x$. The claim is that it is always possible to construct local trivialization on each $U_i$ so that the transition functions is precisely given by the $g$'s. To see this, for any $U_i$, we have $\pi^{-1}(U_i)=[U_i\times\mathbb C]$. Since for any $[(x, \lambda)] \in [U_i\times\mathbb C] $, there is precisely one representative from $U_i\times\mathbb C$, so we may define the map $\varphi_{U_i}:\pi^{-1}(U_i)\to U_i\times\mathbb C$ by $\varphi_{U_i}([(x, \lambda)] = (x_i, \lambda_i)$ for the unique representative, the claim follows.
Now answering the questions: for two open covers $\mathcal U_1,\mathcal U_2$ and two families of transition functions, we can construct corresponding line bundles $L_1,L_2$ as outlined above. If for each pair of open sets $U$ and $V$ in their respective open covers, we can a transition function $g_{UV}$ that is compatible with the original families of transition functions, then the two bundles must be isomorphic. Since we may use the open cover $\mathcal U_1\cup\mathcal U_2$ and put together the family of transition functions to form a third line bundle $L_3$, this line bundle would be isomorphic to both $L_1$ and $L_2$, as we may define the fiber preserving morphism $f:L_1\to L_3$ by $f([(x, \lambda)]_1) =[(x, \lambda)]_3$, which admits an inverse by mapping $\eta=[(x, \lambda)]_1 \in L_3$ to the unique equivalent class of $L_1$ that is contained in $\eta$ as subset. Now we may refine the open covers in the following way, define $\mathcal U=\{U_1\cap U_2|U_1\in\mathcal U_1, U_2\in\mathcal U_2\}$. We may define transition functions on the refined open cover according to the transition functions we started with. We see by the same argument that the line bundles obtained from the refinements are actually isomorphic to the original bundles $L_i$, provided that the transition functions are homotopic, as described in this question.
The second question may be answered similarly. Again we construct the line bundle $L$ as a quotient. The idea is that for holomorphic transition functions, we can define a complex structure on $L$ via the charts $(h_i\times\operatorname{Id})\circ\varphi_{U_i}$ where $h_j$ is a chart on $U_i\subset M$ (in full details we need to find another open cover consisting of charts and intersect with the original cover, but we may assume the original cover already consists of charts using refinement again). If $U$ is other open set over which $L$ trivializes, then by definition we have a biholomorphism $\varphi_U: \pi^{-1}(U)\to U\times\mathbb C$. If $U_i, U_j$ is any open set in the open cover that intersects with $U$, then $$(h_j\times\operatorname{Id})\circ \varphi_U\circ\varphi_{U_i}^{-1}\circ(h_i^{-1}\times\operatorname{Id})(x, \lambda) =(h_j(h_i^{-1}(x)), g_{UU_i}( h_j^{-1} (x)) \lambda) $$ is a biholomorphism. Hence $g_{UU_i} $ must be a holomorphic function.
This approach may be generalized to vector bundles.
First the isomorphism between the complex bundle $(T_{\Bbb{R}}X,I)$ and $T^{(1,0)}X$ is really the linear algebra, pick the frame $$(T_{\Bbb{R}}X,I) \to T^{(1,0)}X\\\frac{\partial}{\partial x_i}\mapsto \frac{1}{2}(\frac{\partial}{\partial x_i } - i \frac{\partial}{\partial y_i})$$
If can be checked this is a complex isomorphism, as the local frame is isomorphic the vector bundle is also isomorphic.
For the first one and the third one :
Since $$\frac{\partial}{\partial z_i} =\frac{1}{2}\left(\frac{\partial}{\partial x_i} -i\frac{\partial}{\partial y_i}\right)$$
We know in the real case $$\frac{\partial}{\partial x_j} = \frac{\partial \tilde{x}_j}{\partial x_i} \frac{\partial}{\partial \tilde{x_j}}$$
Similar for $y_i$,
Substitute into the above definition gives that $$\frac{\partial }{\partial z_i} = \frac{\partial \tilde{z}_j}{\partial z_i} \frac{\partial}{\partial \tilde{z_j}}$$
Therefore the cocycle is really given in the first definition,
therefore we see these three definitions are equivalent.
Maybe these three different characterizations have their own strength
for example the cocycle characterization will be useful in building some abstract theory about vector bundles (for example when dealing with classification problems).
The third one is very useful in local computation since we are familiar with differential calculus on the local coordinate for the real case, the calculation will naturally extend to the complexified case in the third setting, that's why this is useful.
Best Answer
In order to have an inclusion $\mathcal{O}\to\mathcal{A}$, you have to take $\mathcal{A}$ to be the sheaf of complex valued smooth functions. Then $\mathcal{A}^\times$ is the sheaf of smooth functions with values in $GL(1,\mathbb C)$ and hence $H^1(X,\mathcal{A}^\times)$ classifies complex line bundles.