Not precisely.
About histograms, KDEs and ECDFs.
(1) Roughly speaking, a histogram (on a density scale so that the sum of areas of bars is unity) can be viewed as a estimate of the density function. A KDE is a more sophisticated method of density estimation. Generally speaking one cannot reconstruct the exact values of the data for either a histogram or a KDE.
(2) By contrast an empirical CDF (ECDF) retains exact information about all of the data. An ECDF is made as
follows: (a) sort the data from smallest to largest, (b) make a stair-step function that begins at 0 below the
minimum and increases by $1/n$ at each data value, where $n$ is the sample size. If $k$ values are tied then the increase is $k/n$ at the tied value.
Thus the ECDF approximates the CDF of the distribution,
with increasingly accurate approximations for samples of increasing size. Generally speaking an ECDF gives a better approximation to the population CDF than a histogram gives for the density function. (Information
is lost in binning data to make a histogram.)
[By suitable manipulation (a kind of numerical integration), information in a KDE could be used to make a function that imitates
the population CDF, but it does not use the actual data values. In my experience, this is rarely done.]
Graphical illustrations.
(1) A sample of size $n = 100$ from $$\mathsf{Gamma}(\text{shape} = \alpha = 5,\,\text{rate} = \lambda = 1/6)$$ is simulated. The figure shows a density histogram (blue bars), the default KDE from R statistical software (red curve), and the population density function (black).
set.seed(930)
x = rgamma(100, 5, 1/6)
summary(x)
hist(x, prob=T, ylim=c(0,.035),
col="skyblue2", main="n = 100")
rug(x) # tick marks below x-axis
lines(density(x), lwd=2, lty="dotted", col="red")
curve(dgamma(x, 5, 1/6), add=T)
(2) Sampling from the same distribution, we show the ECDF for a sample of size $n = 20,$ so that the
steps are easy to see.
set.seed(2019)
x = rgamma(20, 5, 1/6)
plot(ecdf(x), main="n = 20", col="blue"); rug(x)
curve(pgamma(x, 5, 1/6), add=T, lwd=2)
A probability distribution is just about anything that defines the likelihood of certain outcomes from an experiment. That can be defined in different ways, including the probability density function (PDF) for continuous variables, or the probability mass function (PMF) for discrete variables, or the cumulative distribution function for either continuous or discrete variables. If it's scaled properly and can be used to determine the likelihood of any possible outcome, it is some kind of probability distribution.
In practice, I would typically read "probability distribution" as referring to the non-cumulative density or mass function, but it's not a precise term. One would be reasonably well understood if they refer to a "normal probability distribution", although the more precise term would be a "normal probability density function". Probability density functions often come in common shapes like normal, uniform, Weibull, binomial, and others which are easily referred to by name, and referring to those distributions typically calls to mind an image of the probability density function rather than the cumulative distribution function - when someone refers to the "normal probability distribution", I'd wager that most people will typically think of a Gaussian bell curve PDF and not the sigmoidal CDF. Either one are equally valid representations of the same probability distribution, though - the underlying distribution is not affected by how one chooses to represent it.
Best Answer
Consider the following example. Let $Y\sim\exp(\lambda)$, where $\lambda\in \mathbb{R}_{>0}$. The value of parameter $\lambda$ is unknown. So one considers the family of probability distributions on $\mathbb{R}$ indexed by $\lambda$ s.t. $p_{\lambda}(y)=\lambda\exp(-\lambda y)$, $y\ge 0$. Letting $P_{\lambda}$ denote the distribution corresponding to $p_{\lambda}$, i.e. $P_{\lambda}(A)=\int_A p_{\lambda}(y)\,dy$, this family can be written as $\{P_{\lambda}:\lambda\in(0,\infty)\}$. The log-likelihood in this case is given by $$ l(\lambda)=\ln p_{\lambda}(y)=\ln\lambda-\lambda y. $$
If we are given $n$ i.i.d. copies of $Y$, i.e. $(Y_1,\ldots,Y_n)$, then $p_{\lambda}(y)=\prod_{i=1}^n \lambda \exp(-\lambda y_i)$, where $y\equiv(y_1,\ldots,y_n)\ge 0$. The log-likelihood in this case becomes $$ l(\lambda)=n\ln\lambda-\lambda\sum_{i=1}^n y_i. $$