It's definitely doable. Let's consider a simpler example first: let $X=[0,1]$, and let $A=\{0\}$.
You can retract $X\times I$ (a square) to $(X\times\{0\})\cup(A\times I)$ (the union of the "bottom" and "left" sides of the square) by projecting each point along the ray from $(2,2)$:
To move this intuition to your example of $X=$ a disk and $A=$ a smaller disk inside $X$, just "swing this around" (as one would to form a solid of revolution) and leave the interior of $A$ alone.
For fun:
PlotACylinder[RadiusOfA_, Height_, theta_, u_] :=
{RadiusOfA*Cos[theta], RadiusOfA*Sin[theta], Height*u}
PlotATop[RadiusOfA_, Height_, theta_, u_] :=
{RadiusOfA*u*Cos[theta], RadiusOfA*u*Sin[theta], Height}
PlotX[RadiusOfX_, theta_, u_] :=
{RadiusOfX*u*Cos[theta], RadiusOfX*u*Sin[theta], 0}
PlotTopSurface[RadiusOfA_, RadiusOfX_, Height_, t_, theta_, u_] :=
Module[{x, y},
x = RadiusOfA + (RadiusOfX - RadiusOfA) u;
y = 2 Height*(1 - (2 RadiusOfX - 2 RadiusOfA)/(2 RadiusOfX - RadiusOfA - x))
+ Height*(2 RadiusOfX - 2 RadiusOfA)/(2 RadiusOfX - RadiusOfA - x);
{(x (1 - t) + RadiusOfA*t)*Cos[theta], (x (1 - t) + RadiusOfA*t)*Sin[theta],
Height (1 - t) + y*t}]
PlotSideSurface[RadiusOfA_, RadiusOfX_, Height_, t_, theta_, u_] :=
Module[{x, y},
y = Height*u;
x = (2 RadiusOfX - RadiusOfA)*(1 - (2 Height/(2 Height - y)))
+ RadiusOfX (2 Height/(2 Height - y));
{(RadiusOfX (1 - t) + x*t)*Cos[theta], (RadiusOfX (1 - t) + x*t)*Sin[theta],
y (1 - t)}]
PlotRetract[RadiusOfA_, RadiusOfX_, Height_, t_] := ParametricPlot3D[
{PlotACylinder[RadiusOfA, Height, theta, u],
PlotATop[RadiusOfA, Height, theta, u],
PlotX[RadiusOfX, theta, u],
PlotTopSurface[RadiusOfA, RadiusOfX, Height, t, theta, u],
PlotSideSurface[RadiusOfA, RadiusOfX, Height, t, theta, u]},
{theta, 0, 2 Pi}, {u, 0, 1}, Mesh -> None, Axes -> None,
Boxed -> False, PlotPoints -> 30,
Lighting -> {{"Directional", White, {{1, 1, 1}, {0, 0, 0}}}},
PlotStyle -> {Gray, Gray, Gray, Directive[Blue, Opacity[0.5]],
Directive[Blue, Opacity[0.5]]}]
Export["animation.gif", Table[PlotRetract[1, 3, 4, Max[0, t]],
{t, -0.1, 0.98, 0.02}], "DisplayDurations" -> {0.125}]
Let $(X,A)$ be the CW-pair. We can inductively construct an open neighborhood $N_\epsilon(A)$, where $\epsilon$ is function assigning to each cell $e_\alpha$ a positive $\epsilon_\alpha<1$.
Assume that $N^n_\epsilon(A)$ has been constructed, a neighborhood of $A\cap X^n$ in $X^n$, starting the process with $N^0_\epsilon(A)=A\cap X^0$. Then we define $N^{n+1}_\epsilon(A)$ by specifying its preimage under the characteristic map $\Phi_\alpha:D^{n+1}\to X$ of each cell $e_\alpha^{n+1}$. This will be a product $(1-\epsilon_\alpha,1]\times\Phi^{-1}_\alpha(N^{n}_\epsilon(A))$ with respect to 'spherical' coordinates $(r,\theta)$
in $D^{n+1}$, where $r\in[0,1]$ is the radial coordinate and $\theta$ lies in $\partial D^{n+1}=S^n$. Obviously, $\Phi^{-1}_\alpha(N^{n+1}_\epsilon(A))$ will be defined as all of $D^{n+1}$ if $e_\alpha$ is a cell in $A$.
We can perform the deformation retraction of $N_\epsilon^{n+1}(A)$ onto $N^n_\epsilon(A)$ during the time interval $[1/2^{n+1},\ 1/2^n]$. So this is a map $N^{n+1}_\epsilon(A))\times[1/2^{n+1},\ 1/2^n]\to N_\epsilon^{n+1}(A)$, call it $h^n$ between the identity on $N^{n+1}_\epsilon$ and a retraction $r^n$ of $N^{n+1}_\epsilon$ to $N^n_\epsilon$ at the time $1/2^n$. Since the next homotopy $h^{n-1}$ is defined as a map $N^n_\epsilon\times[1/2^n,\ 1/2^{n-1}]\to N^n_\epsilon$, but we would like to have a map $N^{n+1}_\epsilon(A))\times[1/2^{n+1},\ 1/2^{n-1}]\to N_\epsilon^{n+1}(A)$, we can simply compose $h^{n-1}$ with $r^n$ for $t\ge/2^n$. In the end, since we actually want a map $h:N_\epsilon(A)\times I\to N_\epsilon(A)$, all we have to do is to compose $h^{n-1}(-,t)\circ r^n\circ...\circ r^m$ for a point $x\in N^{m+1}(A)$ in order to see where it gets mapped to at $t\in[1/2^n,\ 1/2^{n-1}]$. This is continuous since its composition with each characteristic map is continuous and CW-complexes have the final topology.
Best Answer
It seems to me the difference between the pair $(X,A)$ being a good pair and having the HEP is very slight, so this answer is meant as more of a comment to illustrate the differences.
Hatcher (in "Algebraic Topology", just after Theorem 2.13) defines $(X,A)$ to be a good pair if
In the same text, soon after Example 0.14, he defines the pair $(X,A)$ to have the homotopy extension property if, here for only $A$ a subspace of $X$ (no condition on $A$ being closed or non-empty),
Most other sources do not use the phrase "good pair" and simply stick to HEP. One such source (in my opinion, the best source) is May. There (in "A Concise Course in Algebraic Topology", Chapter 6, Section 1), still only assuming $A$ is a subspace of $X$, the pair $(X,A)$ along with a map (not necessarily the inclusion) $i:A\to X$, is defined to have the homotopy extension property if
In Section 4 of the same chapter, now assuming $A$ is closed in $X$, May shows that $(X,A)$ is a neighborhood deformation retract pair if and only if
In fact, now it seems Hatcher's definitions of good pair and having the HEP are equivalent from May's viewpoint. That's why, in my opinion, the phrase "good pair" is not the best approach to use, and instead we should talk about cofibrations that are or are not inclusions.