To give an unmathematical catchy answer, let's look at Fraunhofer diffraction in double slit experiment.
Interference at the observation plane depends on slit parameter $d$. What is the frequency of slits? E.g. $1\,\text{mm}\frac{1}{d}$: number of slits per length. Concluding frequiency in the setup. The following argumentation links this frequency to the fourier transform. The physical significance is in the real optics setup. The setup is easier described, when transformed in fourier space.
The double slit link on high school level above gives all the math without integrals. After visualizing the following concept, you see that the integrals are just math to convert the diffraction pattern in fourier space via fourier transformation.
Using trigonometry first compute phase difference $\Delta\phi(\theta,d)$.
Go deeper in this concept using a sketch to visualize phase difference of $n\cdot\lambda$, $n\in\mathbf{N}$ as bright maxima in diffraction pattern. There is no magic in the next step. It's just a another point of view: Try to grasp $\frac{1}{d}$ as a parameter on its own: $\Delta\phi(\theta,\frac{1}{d})$.
Fourier space is a synonym for frequency domain. Acoustics examples are given in Eichenlaubs SE answer and ptomatoes optics explanation.
No literature: Calculate and understand yourself with the links above.
If I take the Fourier transform of the autocorrelation of a signal in time, I will get the power spectral density.
It so happens that the autocorrelation function is a Fourier transform pair of the power spectral density. This is not to say that the only way to calculate the power spectral density is from the autocorrelation function.
As I stated at https://physics.stackexchange.com/a/309544/59023, the power spectral density, $S_{k}$, is proportional to the square of the magnitude of the Fourier transform of a signal, i.e., $S_{k} \propto \lvert X_{k} \rvert^{2}$.
Although what is the physical meaning of the power spectrums of $E(t)$ and $I(t)$, respectively, and their differences?
First, let me use the generic symbol $X_{k}$ to represent the Fourier transform of the time domain signal $x_{n}$.
The words power spectrum are somewhat ambiguous here. In principle, one can compute a power spectrum (i.e., respective value vs. frequency) from each component of $\mathbf{E}$ or its magnitude. One can also compute the amplitude spectra, $A_{k} \propto \lvert X_{k} \rvert$, of the signal.
In the following, I will assume you are asking about $S_{k}$ and not $A_{k}$ for each of these.
The power spectrum of $\mathbf{E}$, whether of components ($E_{j}$) or vector magnitude ($\lvert \mathbf{E} \rvert$), describe the power of the field as functions of frequency with units (if properly normalized) of (V m-1)2 Hz-1. This is useful when trying to determine whether there exists, e.g., a wave at a given frequency which would show up as peak above the backgroun in $S_{k}$. If the oscillations exist only along the x-component of $\mathbf{E}$ (i.e., a longitudinal, electrostatic oscillation), then the spectrum of both $\lvert \mathbf{E} \rvert$ and $E_{x}$ would show a frequency peak but not $E_{y}$ or $E_{z}$.
The intensity, as you have written it, is just the field energy density multiplied by a constant. Thus, the power spectrum of $I$ would be qualitatively similar to that of $\lvert \mathbf{E} \rvert$.
Although why is this true? Wouldn't this power spectrum only indicate the frequency of the signal itself and not the photons comprising it?
I am not sure about what you were told and whether you are correctly conveying that information. A discrete Fourier transform or DFT (i.e., what you use in practice on real signals through algorithms like the FFT) is not the same as a continuous Fourier transform (CFT). In a DFT, the frequency bin width is defined as:
$$
\Delta f = \frac{ f_{s} }{ 2 \ N } \tag{1}
$$
where $f_{s}$ is the sample rate of the signal [e.g., vectors per second] and $N$ is the number of individual points used in the DFT.
In a CFT, the minimum $\Delta f$ is mathematically zero (i.e., infinitesimally small) but quantum shows us that energy/momentum are quantized and thus have discrete values. Therefore, there are physical limits on the lower bound of $\Delta f$. In this case, a variant of the uncertainty principle is applicable, called the time-energy uncertainty principle, which is roughly given as:
$$
\Delta E \ \Delta t \geq \frac{\hbar}{2} \tag{2}
$$
where $\hbar$ is the Planck constant and $\Delta Q$ is the minimum resolution of quantity $Q$.
Thus, the transition has a known energy change but we cannot know this better than that given by Equation 2. For photons, we can directly convert energy to frequency with some constants, i.e., $E = h \ \nu$, thus we have the limitation on the frequency resolution of the emitted photons.
Best Answer
A fourier transform over spatial data gives a spectrum of spatial frequencies. Where a transform of temporal data gives the amplitude versus cycles per second, spatial frequency has units of cycles per meter (or whatever length unit).