You don't need the survey
package or anything complicated. Wooldridge (2010, p. 920 onwards) "Econometric Analysis of Cross Section and Panel Data" has a simple procedure from which you can obtain the standard errors in order to construct the confidence intervals.
Under the assumption that you have correctly specified the propensity score which we denote as $p(\textbf{x}_i,\textbf{$\gamma$})$, define the score from the propensity score estimation (i.e. your first logit or probit regression) as
$$\textbf{d}_i = \frac{\nabla_\gamma p(\textbf{x}_i,\textbf{$\gamma$})'[Z_i-p(\textbf{x}_i,\textbf{$\gamma$})]}{p(\textbf{x}_i,\textbf{$\gamma$}){[1-p(\textbf{x}_i,\textbf{$\gamma$})]}} $$
and let
$$\text{ATE}_i = \frac{[Z_i-p(\textbf{x}_i,\textbf{$\gamma$})]Y_i}{p(\textbf{x}_i,\textbf{$\gamma$}){[1-p(\textbf{x}_i,\textbf{$\gamma$})]}}$$
as you have it in your expression above. Then take the sample analogues of these two expressions and regress $\widehat{\text{ATE}}_i$ on $\widehat{\textbf{d}}_i$. Make sure you include an intercept in this regression. Let $e_i$ be the residual from that regression, then the asymptotic variance of $\sqrt{N}(\widehat{\text{ATE}} - \text{ATE})$ is simply $\text{Var}(e_i)$. So the asymptotic standard error of your ATE is
$$\frac{\left[ \frac{1}{N}\sum^N_{i=1}e_i^2 \right]^{\frac{1}{2}}}{\sqrt{N}}$$
You can then calculate the confidence interval in the usual way (see for example the comments to the answer here for a code example). You don't need to adjust the confidence interval again for the inverse propensity score weights because this step was already included in the calculation of the standard errors.
Unfortunately I am not an R guy so I can't provide you with the specific code but the outlined procedure above should be straight forward to follow. As a side note, this is also the way in which the treatrew
command in Stata works. This command was written and introduced in the Stata Journal by Cerulli (2014). If you don't have access to the article you can check his slides which also outline the procedure of calculating the standard errors from inverse propensity score weighting. There he also discusses some slight conceptual differences between estimating the propensity score via logit or probit but for the sake of this answer it was not overly important so I omitted this part.
All matching estimators for the treatment on the treated effect can be written in the form
$$ \frac{1}{n_T} \sum_{i \in \{d_i=1\}} \left[ y_{1i} - \sum_{j \in \{d_j = 0 \}} w_{ij} \cdot y_{0j} \right] ,$$
where $w_{ij}$ is the weight placed on the $j$th untreated observation as a counterfactual for the $i$th treated observation, and $n_T$ is the number of treated persons. The weights satisfy $\sum_j w_{ij}=1$ for all $i$.
Effectively, from each treated observation $i$, you subtract a weighted average of the control observations. Then you take the average of these differences. These weights are specific to observation $i$. Different matching estimators differ in how they construct the weights.
For example, nearest neighbor matching sets the weight to 1 for the single untreated observations closest to $i$ in terms of the propensity score and to 0 for all others. k-NN uses $k$ closest neighbors instead.
Interval matching consists of dividing the range of propensity scores into a fixed number of intervals (which need not be of equal length). An interval-specific estimate is obtained by taking the difference between the mean outcomes of the treated and untreated units in each interval.
Radius/caliper matching takes the mean of the outcomes for untreated units within a fixed radius of each treated unit as the estimated expected counterfactual. You pick the radius.
Kernel matching uses weights that decline with the PS distance. You can think about kernel matching as running a weighted regression for each treated observation using the comparison group data and the regression includes only an intercept term. Here you have to pick the kernel and the bandwidth. Larger bandwidth means further observations will have larger weights.
Local linear matching is very similar, but also included a linear term in PS. Some people will also include higher order polynomial terms.
Finally, you have inverse probability weighting. The basic idea is that you can figure out the expected untreated outcome (in either the treated population or the full population) by reweighting the observed values using the treatment probabilities.
There are some guidelines about how to pick a method here.
There is a list of software and packages that can do matching here. Stata also now has native PSM estimators. In my experience, replicating the output by hand is often very hard once you go past the simplest estimators. However, you can also find examples with output for all of these online, so even if you don't have the software, they will give you a useful benchmark since you can usually track down the data.
Best Answer
The article is blocked behind a paywall. Nonetheless, I think the major terms and components can be addressed based on your description.
Propensity score weighting does not weight by the "odds" or weight by the "inverse". Propensity score weighting weights observations by the inverse of the probability of receipt of the treatment.
A difference-in-differences is an estimand, not a response variable. The advantages of ANCOVA, modeling the outcome adjusting for baseline values as a covariate, over a change-score approach have been discussed several times on this site. See here for a lively and thorough discussion. Even so, the difference between the two approaches is a fixed effect vs. an offset; thus the outcome is always just the response variable; hence the formatting of the response variable and interpretation of the treatment receipt coefficient as a difference-in-differences is the same in both approaches.
The average treatment effect on the treated and the average treatment effect (on the sample) is not a designation I've heard before. By definition we estimate the ATE by subtracting a comparable set of differences that would be found in an untreated group. In a clinical study this would be called Hawthorne effect, in observational studies this is usually a type of prevalent case bias. Together, they are types of pre/post differences that do not arise as a form of confounding, so it is not addressable by propensity score weighting.
Conversely, regardless of the presence of these effects, confounding by indication is capable of exaggerating (or attenuating) treatment effects. Propensity score methods (matching or weighting) are still needed to control for confounding effects.