Assuming that the data cover a large range and do not contain much noise and using as a model $$f(x)=a\sin \left(\frac{2\pi}{b} (x-c)\right)+d$$ we can get some estimates using $$f(0)=d-a \sin \left(\frac{2 \pi c}{b}\right)$$ which gives an estimate of $d$.
For the point $x_1$ where $f(x_1)=0$ we have $$x_1=c-\frac{b \sin ^{-1}\left(\frac{d}{a}\right)}{2 \pi }$$ which gives an estimate of $c$.
For the point $x_2$ where $f'(x_2)=0$ we have $$x_2=\frac{b}{4}+c$$ which gives an estimate of $b$ and $f(x_2)=a+d$ gives an estimate of $a$.
So, basically looking at the plot of the data, we have, in principle, at least consistent estimates of all parameters. and we can safely start the full nonlinear regression.
However, this would imply solving a tedious equation for $a$. But using the above, we can reduce in a first step to the fit of a single parameter $a$, parameters $b,c,d$ being expressed as functions of $a$ using the above relations. When the optimum $a$ has been found (this can be done using a plot of the sum of squares computed for a few discret values of $a$ until a minumum value is detected), we can safely start the full nonlinear regression with consistent estimates.
Edit
Thinking more about the problem, I suppose that I should write the model as
$$f(x)=a \sin(\alpha x+\beta)+d\qquad \text{with}\qquad \alpha=\frac {2\pi}b\qquad \text{and}\qquad \beta=-\frac {2\pi c}b$$ Expanding the sine $$f(x)=a \cos (\beta ) \sin (\alpha x)+a \sin (\beta ) \cos (\alpha x)+d=A\sin (\alpha x)+B\cos(\alpha x)+d $$ which is linear if $\alpha$ is given a specific value.
For this value of $\alpha$, use a linear regression to get the best values of parameters $A( \alpha)$, $B( \alpha)$, $d( \alpha)$ and compute the sum of square of residuals. You will find, by trial and error, a zone where $SSQ( \alpha)$ is minimum. Fo this value $\alpha_*$, go back $$A(\alpha_*)=a \cos(\beta)\qquad B(\alpha_*)=a \sin(\beta)\qquad \implies\qquad a\qquad \text{and} \qquad\beta$$ which themselves lead to $b$ and $c$.
For illustration purposes, I used the following (synthetic) data set
$$\left(
\begin{array}{cc}
x & f(x) \\
0.0 & -11.47 \\
0.5 & -15.62 \\
1.0 & -18.19 \\
1.5 & -19.17 \\
2.0 & -18.02 \\
2.5 & -15.30 \\
3.0 & -11.06 \\
3.5 & -6.34 \\
4.0 & -1.50 \\
4.5 & +2.33 \\
5.0 & +4.92 \\
5.5 & +5.49 \\
6.0 & +4.33
\end{array}
\right)$$
Below are reported the results of the linear regressions for different values of $\alpha$
$$\left(
\begin{array}{cc}
\alpha & \\
0.1 & \{145.375,\{A\to -7.60697,B\to -174.895,d\to 158.722\}\} \\
0.2 & \{135.708,\{A\to -5.44567,B\to -48.7632,d\to 32.6894\}\} \\
0.3 & \{119.635,\{A\to -5.52408,B\to -25.2303,d\to 9.34186\}\} \\
0.4 & \{97.4249,\{A\to -6.2399,B\to -16.7424,d\to 1.16049\}\} \\
0.5 & \{69.9893,\{A\to -7.29279,B\to -12.459,d\to -2.63686\}\} \\
0.6 & \{39.7696,\{A\to -8.58334,B\to -9.62791,d\to -4.7105\}\} \\
0.7 & \{12.4717,\{A\to -10.0273,B\to -7.20268,d\to -5.97143\}\} \\
0.8 & \{\color{red}{0.0564199},\{A\to -11.4518,B\to -4.6306,d\to -6.79914\}\} \\
0.9 & \{24.2511,\{A\to -12.4754,B\to -1.59026,d\to -7.37301\}\} \\
1.0 & \{115.154,\{A\to -12.4273,B\to 1.90681,d\to -7.78404\}\}
\end{array}
\right)$$
Using $\alpha_*=0.8$, we get $a=12.3526$, $d=−6.79914$ and $\beta=-2.75734$ and then $b=7.85398$ and $c=3.44668$. Using these numbers, the reults of the nonlinear regression are
$$\begin{array}{clclclclc}
\text{} & \text{Estimate} & \text{Standard Error} & \text{Confidence Interval} \\
a & 12.346 & 0.0234427 & \{12.2919,12.4001\} \\
b & 7.89158 & 0.0148157 & \{7.85741,7.92574\} \\
c & 3.45132 & 0.003368 & \{3.44355,3.45909\} \\
d & -6.77303 & 0.0209294 & \{-6.82129,-6.72477\} \\
\end{array}$$ As you can notice, the final parameters are quite close to the guesses. Comparing data and prediction
$$\left(
\begin{array}{ccc}
x & f(x) & \text{prediction} \\
0.0 & -11.47 & -11.509 \\
0.5 & -15.62 & -15.558 \\
1.0 & -18.19 & -18.234 \\
1.5 & -19.17 & -19.117 \\
2.0 & -18.02 & -18.070 \\
2.5 & -15.30 & -15.255 \\
3.0 & -11.06 & -11.115 \\
3.5 & -6.34 & -6.2947 \\
4.0 & -1.50 & -1.5496 \\
4.5 & +2.33 & +2.3786 \\
5.0 & +4.92 & +4.8754 \\
5.5 & +5.49 & +5.5505 \\
6.0 & +4.33 & +4.2982
\end{array}
\right)$$
Update
Using the data you posted in your edit
$$\left(
\begin{array}{cc}
x & f(x) \\
1 & 3.42 \\
2 & 0.73 \\
3 & 0.12 \\
4 & 2.16 \\
5 & 4.97 \\
6 & 5.97
\end{array}
\right)$$ applying the procedure given in my edit, we obtain
$$\begin{array}{clclclclc}
\text{} & \text{Estimate} & \text{Standard Error} & \text{Confidence Interval} \\
a & 3.00141 & 0.00178404 & \{2.97874,3.02408\} \\
b & 6.28596 & 0.00321329 & \{6.24513,6.32679\} \\
c & 4.28388 & 0.00094226 & \{4.27190,4.29585\} \\
d & 2.99993 & 0.00174274 & \{2.97779,3.02207\} \\
\end{array}$$
Comparing data and prediction
$$\left(
\begin{array}{ccc}
x & f(x) & \text{prediction} \\
1 & 3.42 &3.42124\\
2 & 0.73 &0.72783\\
3 & 0.12 &0.12170\\
4 & 2.16 &2.15966\\
5 & 4.97 &4.96954\\
6 & 5.97 &5.97003
\end{array}
\right)$$ Your very simplified procedure leads to very different results.
You want to fit the model
$$y=21-e^{a x}\tag 1$$
For sure, you can have an extimate writing
$$21-y=e^{a x}\implies \log(21-y)=ax\implies z=a x\tag 2$$ and a preliminary linear regression gives $a=0.147233$ (just as you did).
In fact, you do not need to use regression since you can get $a$ directly from the normal equation
$$a=\frac{\sum_{i=1}^n x_iz_i } { \sum_{i=1}^n x_i^2 }=\frac{\sum_{i=1}^n x_i \log(21-y_i)} { \sum_{i=1}^n x_i^2 }$$
But this is only the preliminary step since what is measured is $y$ and not $\log(21-y)$. So, you need to continue with a nonlinear regression using this estimate. This would lead to $a=0.149140$.
Let us compare the results for $y$ using both models
$$\left(
\begin{array}{cccc}
x & y & (2) & (1) \\
0 & 20 & 20.0000 & 20.0000 \\
1 & 20 & 19.8414 & 19.8392 \\
2 & 20 & 19.6576 & 19.6525 \\
3 & 20 & 19.4447 & 19.4357 \\
4 & 20 & 19.1979 & 19.1841 \\
5 & 20 & 18.9121 & 18.8921 \\
6 & 18 & 18.5809 & 18.5531 \\
7 & 18 & 18.1972 & 18.1595 \\
8 & 18 & 17.7526 & 17.7027 \\
9 & 18 & 17.2374 & 17.1723 \\
10 & 16 & 16.6406 & 16.5567 \\
11 & 16 & 15.9491 & 15.8421 \\
12 & 14 & 15.1479 & 15.0125 \\
13 & 14 & 14.2196 & 14.0495 \\
14 & 12 & 13.1441 & 12.9316 \\
15 & 12 & 11.8980 & 11.6339 \\
16 & 10 & 10.4542 & 10.1275 \\
17 & 8 & 8.78136 & 8.37881 \\
18 & 6 & 6.84319 & 6.34887 \\
19 & 4 & 4.59758 & 3.99245 \\
20 & 2 & 1.99576 & 1.25704
\end{array}
\right)$$
Using model $(2)$ and back to the $y$'s, the sum of squares is $8.28$ while using model $(1)$ lead to a sum of squares equal to $6.66$ which is quite better.
Moreover, it is interesting to look at the statistics.
For model $(2)$, we have
$$\begin{array}{clclclclc}
\text{} & \text{Estimate} & \text{Standard Error} & \text{Confidence Interval} \\
a & 0.147233 & 0.005034 & \{0.136698,0.157769\} \\
\end{array}$$ while for model $(1)$
$$\begin{array}{clclclclc}
\text{} & \text{Estimate} & \text{Standard Error} & \text{Confidence Interval} \\
a & 0.149140 & 0.000873 & \{0.147312,0.150967\} \\
\end{array}$$ showing that, using the "true" model, the standard error is basically divided by a factor of almost $6$.
If you do not want to use nonlinear regression, you could use Excel to solve for $a$ the equation
$$f(a)=\sum_{i=1}^n e^{ax_i}\left(21-e^{ax_i}-y_i \right)=0$$ strating from the preliminary guess. Even graphing the function could be sufficient.
For solving the equation, you could also use Newton method
$$f'(a)=a\sum_{i=1}^n e^{ax_i}\left(21-2e^{ax_i}-y_i \right)$$ and use
$$a_{n+1}=a_n-\frac{f(a_n)}{f'(a_n)}$$ using for $a_0$ the value obtained from the preliminary step.
For your problem, Newton iterates would be
$$\left(
\begin{array}{cc}
n & a_n \\
0 & 0.1472330000 \\
1 & 0.1492437955 \\
2 & 0.1491401458 \\
3 & 0.1491398530
\end{array}
\right)$$
Edit
If we consider the data set outside its specific context, we could have obtained a better fit using
$$y=a-b\, e^{cx}\tag 3$$ which leads to a sum of squares equal to $4.97$ with the following parameters
$$\begin{array}{clclclclc}
\text{} & \text{Estimate} & \text{Standard Error} & \text{Confidence Interval} \\
a & 22.1098 & 0.5276 & \{20.9968,23.2229\} \\
b & 1.57255 & 0.3101 & \{0.91830,2.22680\} \\
c & 0.12823 & 0.0092 & \{0.10875,0.14771\} \\
\end{array}$$ leading to the following results
$$\left(
\begin{array}{ccc}
x & y & (3) \\
0 & 20 & 20.5373 \\
1 & 20 & 20.3221 \\
2 & 20 & 20.0775 \\
3 & 20 & 19.7995 \\
4 & 20 & 19.4834 \\
5 & 20 & 19.1241 \\
6 & 18 & 18.7156 \\
7 & 18 & 18.2513 \\
8 & 18 & 17.7234 \\
9 & 18 & 17.1233 \\
10 & 16 & 16.4410 \\
11 & 16 & 15.6655 \\
12 & 14 & 14.7838 \\
13 & 14 & 13.7816 \\
14 & 12 & 12.6422 \\
15 & 12 & 11.3469 \\
16 & 10 & 9.87440 \\
17 & 8 & 8.20046 \\
18 & 6 & 6.29750 \\
19 & 4 & 4.13420 \\
20 & 2 & 1.67494
\end{array}
\right)$$
Best Answer
Gauss-Newton algorithm directly deals with this type of problems. Given m data points $(x_i,y_i)$ for regression with a function of n parameters $\vec \beta =(\beta_1,...,\beta_n)$ $$min_{\vec \beta }\ S(\vec \beta)\ where\ S(\vec \beta)=\sum_{i=1}^m r_i(\vec \beta)^2=(y_i-f(\vec \beta,x_i))^2$$ I skip the derivation of algorithm which you can find in every textbook (First use Taylor approximation and then use Newton's method). $$\Delta \vec \beta=\big(J^T\ J\big)^{-1}\ J^T\ \vec r$$ $$\vec \beta=\vec \beta + \alpha\ \Delta \beta$$ where $\alpha$ is damping coefficient and $$J=\begin{pmatrix}\bigg(\frac{\partial f}{\partial \beta_1}\bigg)_{x=x_1}&...&\bigg(\frac{\partial f}{\partial \beta_n}\bigg)_{x=x_1}\\\ ...&...&...\\\ \bigg(\frac{\partial f}{\partial \beta_1}\bigg)_{x=x_m}&...&\bigg(\frac{\partial f}{\partial \beta_n}\bigg)_{x=x_m}\end{pmatrix}\quad \vec r=\begin{pmatrix}y_1-f(\vec \beta,x_1)\\\ ... \\\ y_m-f(\vec \beta,x_m) \end{pmatrix}$$ For your specific case $$\frac{\partial f}{\partial A}=sin(Bx_i+C)$$ $$\frac{\partial f}{\partial B}=Ax_icos(Bx_i+C)$$ $$\frac{\partial f}{\partial C}=Acos(Bx_i+C)$$ $$\frac{\partial f}{\partial D}=1$$
In Matlab I generated 60 uniformly distributed random sample points. I used these points to calculate a known sin-curve such as $$y=0.5sin(1.2x+0.3)+0.6$$ I added error terms N(0,0.2) to each point. My initial guess was $A=0.1\ B=0.5\ C=0.9\ D=0.1$ and I set the damping coeff to 0.01. The algorithm determines below approximation equation after 407 iterations $$\hat y=0.497sin(1.178x+0.352)+0.580$$ Below you can see the graph (black - original curve $y(x)$, red - curve with error terms $y(x)+\epsilon (x)$, blue - approximation curve $\hat y(x)$)