The "identification" step refers to identifying an estimator for the quantity of interest and whether or not it can be computed given the causal model. For example, if there are unknown confounders, or undirected edges in a causal graph, an effect may not be identifiable.
A counterfactual is estimated by changing the value of a given set of nodes, leaving the rest in their "factual" state, and re-computing the state of the system. It is similar to an intervention, with the important difference that we are interested in the alternative state of the system, rather than only the change due to the intervention. This requires knowledge about the values of the other system variables that are not intervened upon.
I think in an A/B test like this, where you have an encouragement design with one-sided non-compliance, you can make some progress under reasonable assumptions.
Using the notation from here, where C stands for compliers and not clickers, the LATE identified by IV is
$$\Delta_{IV} =\frac{E(Y_1 \vert C) \cdot Pr(C)−E(Y_0 \vert C) \cdot Pr(C)}{Pr(C)}=E(Y_1 \vert C)−E(Y_0 \vert C)$$
Putting this into relative terms means doing this:
$$\%\Delta_{IV}=\frac{E(Y_1 \vert C)−E(Y_0 \vert C)}{E(Y_0 \vert C)}.$$
The issue is that you don't know the denominator. Assuming treatment is randomized, in the control group:
$$\require{cancel} E(Y_0) = \cancel{E(Y_0 \vert AT)\cdot Pr(AT)}+E(Y_0 \vert C)\cdot Pr(C)+ \cancel{E(Y_0 \vert DF) \cdot Pr(DF)}+E(Y_0 \vert NT) \cdot Pr(NT)$$
Here always-takers go away since control users cannot click. Ditto for defiers. This is different from the typical labor economics experiment, where people can take up job training somewhere else even if they are in the control group. So the LATE = ATT. You also know that your treatment group is compliers, who all click, and never-takers, who don't. This allows you to separate the two groups cleanly. The same logic applies to the control group, since the types are fixed. The outcome for the never-takers should be the same in treatment and control as long as the video is the only channel by which the treatment can change the outcome. This rules out behavior like control users getting pissed about being denied the video and reducing purchases.
But if you are willing to make these reasonable assumptions, you can back out the share of never-takers in the treatment group (96%) and the mean untreated outcome for them (0.42). You can also get the share of compliers in the treated group (4%), which should be the same as in the control. You can then calculate the mean of the untreated outcome for never-takers and compliers together in control (0.41). That should be enough to pin down the mean of the untreated outcome in control (.08), which should be the same in treatment. That should give you a relative lift of $\frac{0.26}{.08} \approx 3.15X$. This is pretty large, but not stag sig. Your first stage is strong, so this is probably not a weak instrument artifact.
Issues of statistical significance aside, this result implies that you have very low take-up, but an enormous effect for those who do it. You may want to explore making take-up easier (product changes making the video more prominent, like screen takeovers, subsidies for watching, etc.). You can also try to fit a model for $\Pr(Complier \vert X)$. Maybe all the compliers are new users, so the strategy above is limited by the inflow of new users.
Standard errors are a bit trickier but can bootstrap the IV regression plus complier arithmetic jointly.
Best Answer
In order to isolate a causal effect, we need the causal effect to be "identifiable."
At a high level, assuming binary variables here, a causal effect is identifiable if we can express the treatment effect that we care about — in this case $P(Cancer(Drug = 1)) - P(Cancer(Drug = 0))$ — in terms of quantities computable from our observed data.
There are a few conditions that need to be satisfied for our causal effect to be identifiable, but since you're asking about "what should I control for," the one that is most relevant is exchangeability/conditional exchangeability. Formally, for your setting, we'd express this as $Cancer(Drug) \perp Drug \mid L$ — conditioned on some set of confounders $L$, there is no dependence between the counterfactual value of "Cancer" and the observed treatment "Drug."
"The hard part" is determining "what goes in $L$." Luckily, the "backdoor criterion" exists to determine which variables you need to control for in a given causal DAG in order to achieve (conditional) exchangeability. This criterion states that, given a causal DAG, you need to "block" all "paths" between treatment and outcome that aren't the treatment -> outcome arrow denoting the effect you're trying to estimate.
You can think of a path in a DAG as a chain of arrows (ignoring the direction for now). To block a path, there needs to be either a "collider" ($\rightarrow X \leftarrow$, where $X$ is some placeholder variable) that we are not conditioning on (+ one other condition that I'll omit for simplicity), or we need to condition on a non-collider ($\rightarrow X \rightarrow$ or $\leftarrow X \rightarrow$).
If you apply these conditions to your DAG, you'll see that, to achieve conditional exchangeability, we need to block the path $Drug \leftarrow Age \rightarrow Cancer$. Since $Age$ is a non-collider, we need to condition on it. We do not need to condition on $Area$, since it does not lie on a path between $Cancer$ and $Drug$. There may settings/specific designs where you might condition on $Area$, but for identifying the causal effect of $Drug$ on $Cancer$, there is no need.
Further reading
My summary of the backdoor criterion is derived from these lecture notes — slides 27-48 — which give a further overview "what do I condition on."
For further details, I'd recommend reading the first 3 chapters (approximately) of What If? — it's a fairly approachable textbook on causal inference.