The fisher information only has a precise meaning when you are dealing with a normally distributed value. In that case, the log likelihood function will be parabolic, and the fisher information will equal the curvature at the MLE. It turns out mathematically that the curvature of the log likelihood is the inverse of the variance of the associated normal random variable.
This is what guides the intuition surrounding fisher information, even though it will only hold approximately for non-normal variables (although, subject to some tehnical conditions, it will usually be asymptotically true.) Also, it serves as a good lower bound on the variance, see here.
Demonstration of the relationship between $I$ and $\sigma$ for gaussian likelihood for the mean
Let $f(X;\theta):= f(x;\mu,\sigma) = \frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x-\mu)^2}{2\sigma^2}}$
Take the logarithm of this:
$$\log(f) = -\log\sqrt{2\pi} - \log\sigma -\frac{(x-\mu)^2}{2\sigma^2}$$
We are looking at the likelihood for the mean ($\mu$) given a sample of data ($\mathbf{x}$) so we treat the above as a function of $\mu$. Given a sample $\mathbf{x}$, we can formulate the likelihood function for $\mu$.
$$L(\mu|\mathbf{x},\sigma) =-n\log\sqrt{2\pi} - n\log\sigma -\frac{1}{2\sigma^2}\sum\limits_{x_i \in \mathbf{x}}(x_i-\mu)^2$$
This function will be quadratic in $\mu$. Therefore, when we take $\frac{d^2L}{d\mu^2}$ we will get a constant value (due to it being quadratic). Specifically:
$$\frac{dL}{d\mu} =\frac{1}{\sigma^2}\sum\limits_{x_i \in \mathbf{x}}(x_i-\mu) \implies \frac{d^2L}{d\mu^2} = \frac{-n}{\sigma^2} = \textrm{constant} $$
Therefore,
$$-E_{\mu}\left(\frac{-n}{\sigma^2}\right) = \frac{n}{\sigma^2} = I(\mu)$$
Which is the Fisher Information about $\mu$, but:
$$ se(\hat \mu) = \frac{\sigma}{\sqrt{n}} \implies I(\mu) = \frac{1}{se(\mu)^2} = \frac{1}{\sigma^2_{\hat \mu}}$$
Therefore, the Fisher Information is the inverse of the variance of the MLE.
Let $X_1,...,X_n \sim f(x;\theta)$. Fisher information is a theoretical measure defined by
$$
\mathcal{I}(\theta) = - \mathbb{E}\left[\frac{\partial^2}{\partial\theta^2}\ln f(x:\theta) \right],
$$
where $\theta$ is the unknown parameter of interest, hence for sample of size $n$ and MLE $\hat{\theta}_n$, you can estimate the fisher information by $n\mathcal{I}(\hat{\theta}_n)$.
Observed information is defined by
$$
\mathcal{I}_{obs}(\theta) = - n\left[\frac{1}{n}\sum_{i=1}^n\frac{\partial^2}{\partial^2 \theta}(\ln f(x_i:\hat{\theta}_n)) \right],
$$
which is simply a sample equivalent of the above. So, as you can see, these two notions defined differently, however if you plug-in the MLE in fisher information you get exactly the observed information, $\mathcal{I}_{obs}(\theta)=n\mathcal{I}(\hat{\theta}_n)$.
To show it for a pretty general case, you can work out the algebra for a single parametric exponential family distribution (it is a straightforward calculations).
Best Answer
From the way you write the information, it seems that you assume you have only one parameter to estimate ($\theta$) and you consider one random variable (the observation $X$ from the sample). This makes the argument much simpler so I will carry it in this way.
You use the information when you want to conduct inference by maximizing the log likelihood. That log-likelihood is a function of $\theta$ that is random because it depends on $X$. You would like to find a unique maximum by locating the theta that gives you that maximum. Typically, you solve the first order conditions by equating the score $\frac{\partial\ell \left( \theta ; x \right)}{\partial \theta} = \frac{\partial\log p \left( x ; \theta \right)}{\partial \theta}$ to 0. Now you would like to know how accurate that estimate is. How much curvature the likelihood function around its maximum is going to give you that information (if it's peaked around the maximum, you are fairly certain, otherwise if the likelihood is flat you are quite uncertain about the estimate). Probabilistically, you would like to know the variance of the score "around there" (this is heuristic and a non-rigorous argument. You could actually show the equivalence between the geometric and probabilistic/statistical concepts).
Now, we know that on average, the score is zero (see proof of that point at the end of this answer). Thus \begin{eqnarray*} E \left[ \frac{\partial \ell \left( \theta ; x \right)}{\partial \theta} \right] & = & 0\\ \int \frac{\partial \ell \left( \theta ; x \right)}{\partial \theta} p \left( x ; \theta \right) d x & = & 0 \end{eqnarray*} Take derivatives at both sides (we can interchange integral and derivative here but I am not going to give rigorous conditions here) \begin{eqnarray*} \frac{\partial}{\partial \theta} \int \frac{\partial \ell \left( \theta ; x \right)}{\partial \theta} p \left( x ; \theta \right) d x & = & 0\\ \int \frac{\partial^2 \ell \left( \theta ; x \right)}{\partial \theta^2} p \left( x ; \theta \right) d x + \int \frac{\partial \ell \left( \theta ; x \right)}{\partial \theta} \frac{\partial p \left( x ; \theta \right)}{\partial \theta} d x & = & 0 \end{eqnarray*}
The second term on the left-hand side is \begin{eqnarray*} \int \frac{\partial \ell \left( \theta ; x \right)}{\partial \theta} \frac{\partial p \left( x ; \theta \right)}{\partial \theta} d x & = & \int \frac{\partial \log p \left( x ; \theta \right)}{\partial \theta} \frac{\partial p \left( x ; \theta \right)}{\partial \theta} d x\\ & = & \int \frac{\partial \log p \left( x ; \theta \right)}{\partial \theta} \frac{\frac{\partial p \left( x ; \theta \right)}{\partial \theta}}{p \left( x ; \theta \right)} p \left( x ; \theta \right) d x\\ & = & \int \left( \frac{\partial \log p \left( x ; \theta \right)}{\partial \theta} \right)^2 p \left( x ; \theta \right) d x\\ & = & V \left[ \frac{\partial \ell \left( \theta ; x \right)}{\partial \theta} \right] \end{eqnarray*}
(here the second follows from dividing and multiplying by $p(x;\theta)$. The third line follows from applying the chain rule to derivative of log. The final line follows from the expectation of the score being zero, that is the variance is equal to the expectation of the square and no need to subtract the square of the expectation.)
From which you can see
\begin{eqnarray*} V \left[ \frac{\partial \ell \left( \theta ; x \right)}{\partial \theta} \right] & = & - \int \frac{\partial^2 \ell \left( \theta ; x \right)}{\partial \theta^2} p \left( x ; \theta \right) dx\\ & = & - E \left[ \frac{\partial^2 \ell \left( \theta ; x \right)}{\partial \theta^2} \right] \end{eqnarray*}
Now you could see why summarizing uncertainty (curvature) about the likelihood function takes the particular formula of Fisher information.
We can even go further and prove that the maximum likelihood estimator best possible efficiency is given by the inverse of the information (this is called the Cramér-Rao lower bound).
To answer an additional question by the OP, I will show what the expectation of the score is zero. Since $p \left( x, \theta \right)$ is a density \begin{eqnarray*} \int p \left( x ; \theta \right) \mathrm{d} x & = & 1 \end{eqnarray*} Take derivatives on both sides \begin{eqnarray*} \frac{\partial}{\partial \theta} \int p \left( x ; \theta \right) \mathrm{d} x & = & 0 \end{eqnarray*} Looking on the left hand side \begin{eqnarray*} \frac{\partial}{\partial \theta} \int p \left( x ; \theta \right) \mathrm{d} x & = & \int \frac{\partial p \left( x ; \theta \right)}{\partial \theta} \mathrm{d} x\\ & = & \int \frac{\frac{\partial p \left( x ; \theta \right)}{\partial \theta}}{p \left( x ; \theta \right)} p \left( x ; \theta \right) \mathrm{d} x\\ & = & \int \frac{\partial \log p \left( x ; \theta \right)}{\partial \theta} p \left( x ; \theta \right) \mathrm{d} x\\ & = & E \left[ \frac{\partial \ell \left( \theta ; x \right)}{\partial \theta} \right] \end{eqnarray*} Thus the expectation of the score is zero.
This was a non-rigorous exposition. I recommend you follow on the arguments here in a very good textbook on statistical inference. (I personally recommend the book by Casella and Berger but there are many other excellent books.)