The trace is simply a (properly normalised) ad-invariant inner product on the Lie algebra; that is, a nondegenerate symmetric bilinear form $\langle-,-\rangle$ which obeys the "associativity" condition
$$\langle [x,y],z \rangle = \langle x, [y,z] \rangle$$
for every $x,y,z$ in $\mathfrak{g}$.
Lie algebras admitting such inner products are said to be metric. The normalisation of the inner product is such that $k$ is an integer. This only makes sense for indecomposable metric Lie algebras; that is, those which are not isomorphic to the direct product of perpendicular proper ideals.
The notation "tr" stems from the fact that if $\rho: \mathfrak{g} \to \operatorname{End}(V)$ is a faithful finite-dimensional representation, then
$$\langle x, y\rangle := c \operatorname{tr}\rho(x)\rho(y)$$
works for a suitable nonzero $c$. (For a simple Lie algebra, just take $\rho$ to be the adjoint representation.)
For the explicit case of $\mathfrak{g}$ the Lie algebra of SU(2) you can take $\rho$ to be the fundamental representation and $c= -\frac12$, I believe.
Edit
Notice that $\operatorname{tr}(A \wedge dA)$ is really $\langle A \stackrel{\wedge}{,} dA \rangle$, where $\langle -\stackrel{\wedge}{,}-\rangle$ means that we are both taking the wedge product of the forms and the inner product on the Lie algebra. Similarly,
$$\operatorname{tr}(A \wedge A \wedge A) = \frac12 \langle [A\stackrel{\wedge}{,}A] \stackrel{\wedge}{,} A \rangle,$$
with a similar notational caveat about $[A\stackrel{\wedge}{,}A]$.
Further addition
In response to Anirbit's comment, I would say that there is, in general, no trace on vector-valued differential forms; although if the forms take values in endomorphisms, then of course there is: simply compose with the trace of endomorphisms to obtain a map
$$\Omega^\bullet(M;\operatorname{End}(V)) \to \Omega^\bullet(M).$$
I'm far from an expert, and I apologize if this is too basic / philosophical / vague.
In instanton Floer homology, the functional $CS(A)$ plays the role of the potential energy function for a $4$d field theory. In Chern-Simons theory, $CS(A)$ plays the role of the action for a $3$d field theory.
Let's consider the analogous situation in the original setting for Floer homology (as in Supersymmetry + Morse theory). We have a Riemannian manifold $(M,g)$ and a function $f: M \to \mathbb R$ , analogous to the Chern Simons functional. We have two options:
First, we can use $f$ as a potential. This corresponds to a $1$d field theory (or classical mechanical system) whose fields are $\gamma(t) \in {\rm Map}([a,b],M)$, and whose action is $S(\gamma) = \int |\dot \gamma(t)|^2 + f(\gamma(t))dt$. This theory is dependent on the metric $g$, but the "vacuum states" of the quantum mechanical system stay the same as we deform the metric-- the vector space of vacuum states is the analog of the Floer groups.
Second we can use $f$ as the action of a $0$d field theory. The fields are ${\rm Map}(*,M) = M$, and the action is just $f(m)$. Physically, this describes a static system. The integral of $\exp(if)$ is the analog of the Chern-Simons invariants.
These two theories are related in the sense that if we let kinetic energy term of the $1$d field theory $|\dot \gamma(t)|^2$ tend to zero (for instance by making the metric $g$ small), the "limit" should be related to the $0$d field theory. This is the limit where the potential energy is very large relative to the kinetic energy, and the dynamical system approaches a static system.
This is to say, the $0$d field theory is a "dimensional reduction" of the $1$d field theory.
Analogously, we expect the $3$d Chern Simons theory to be a dimensional reduction of the $4$d Donaldson/Floer theory. Dimensional reduction is closely related to decategorification, for instance by taking Euler characteristics. The Kronheimer-Mrowka theorem you mention implies that Khovanov homology has the same Euler characteristic as instanton Floer homology. So they both categorify the Jones Polynomial / Chern simons invariant, as this (very) heuristic picture would suggest.
Best Answer
Often in the literature by "Chern-Simons theory" is meant by default $G$-Chern-Simons theory whose gauge group is a connected and simply connected semisimple compact group $G$, such as $G = SU$. In this case it so happens that all $G$-principal bundles on a 3-manifold $\Sigma_3$ are trivializable, and hence one can identify the space of G-principal connections on $\Sigma_3$ just with that of $\mathfrak{g}$-valued differential forms. So one gets away with the naive formula that you recall above.
In stark contrast to this is what may seem to be a simpler example, namely $U(1)$-Chern-Simons theory. Since $U(1)$ is not simply connected, clearly, there are of course non-trivial $U(1)$-principal bundles on $\Sigma_3$, in general, and hence the above naive approach fails, as you notice.
In this case the correct Chern-Simons action is instead obtained this way: given a field configuration $\nabla$ which is a circle-principal connection, we can form its differential cup-product square in ordinary differential cohomology. This yields a $\mathbf{B}^2 U(1)$-principal 3-connection $\nabla \cup \nabla$, often known as a bundle 2-gerbe with connection or else as a degree-4 cocycle in Deligne cohomology. This now has a connection 3-form and hence has a volume holonomy over $\Sigma_3$. And this now is the correct action functional for Chern-Simons theory. For more on this see at nLab:higher dimensional Chern-Simons theory.
Secretly this higher principal connection structure also governs the first, seemingly simpler case. The action functional of Chern-Simons theory is always the volume holonomy of a 3-connection, the Chern-Simons circle 3-connection.
This is in fact the general abstract characterization of Chern-Simons theories and all its higher (and lower) dimensional variants. A Chern-Simons-type action functional is always the volume holonomy of a refinement of a universal characteristic class to ordinary differential cohomology. Further remarks along these lines are for instance in
Domenico Fiorenza, Hisham Sati, Urs Schreiber, A higher stacky perspective on Chern-Simons theory.