It is literally mostly implemented via the FFT, but it is usually derived with the aid of the DFT. The idea is that much of the statistical properties of the periodogram can be obtained in this way (e.g., you would like not only to know where the peak is but also the significance of that peak!). If it is or not directly the FFT depends on the algorithm; the "classical" periodogram is 'just" the FFT, but it is not only noisy, but also has other serious problems (see the paper from Scargle that I cite below).
The most used form of the periodogram is the Lomb-Scargle (LS) Periodogram (Scargle, 1989; in this paper in fact Scargle does a critique to the "classical" periodogram; it is a MUST READ if you are trying to detect the period of a time series!). You can look at a fast implementation of it in this paper of Press & Rybicki (1989) (if you get lost, you can also look at the explicit implementation in the Numerical Recipes).
I should also add that there are generalizations of the LS Periodogram that account for floating means of the sinusoid, or for unequal variances among the points. The implementation is straightforward once you read the papers that I just gave ;-).
The Fourier transform of discrete data with sampling rate $\Delta t = 1$ is periodic with period $2\pi$ (if we're talking about angular frequency, $\omega$) OR with period $1$ (if we're talking about linear frequency, $f$). The relationship between angular and linear frequency is $\omega = 2\pi f$. We'll assume for simplicity that $\Delta t = 1$ for now and I'll be using linear frequency, $f$.
In fact, with real-valued data the Fourier transform (which is complex) is conjugate-symmetric about $\omega = f = 0$. If $X(f)$ is the Fourier transform of $x(t) \in \mathbb{R}$, then $X(f) = X^{*}(-f)$. The spectrum is proportional to $|X(f)|^{2}$, so the spectrum is symmetric about $f = 0$.
So, with a discretely sampled data ("index limited") we end up with a "band limited" Fourier transform. The Nyquist frequency (the highest frequency which is not an alias of lower frequencies. The below is an example of aliasing (can't tell one sinusoid from another - see Aliasing on Wikipedia).
If $\Delta t \neq 1$ then the Nyquist frequency is $1 / (2\Delta t)$ and the frequencies where the spectrum is calculated (if you have $N$ data points) is
seq(0, 1/(2*dt), by = 1/N)
Usually you want to zeropad for the FFT, so you add some zeroes on at the end.
One of the better options for estimating the spectrum is to use the Multitaper Method (package multitaper in R from CRAN).
Sorry, this was all over the place... An example is below. The plot that it creates demonstrates the bias (broadband bias) properties of the periodogram vs. the multitaper.
An example I thoroughly enjoy comes from Percival and Walden, page 46 - an order 4 autoregressive simulation (so we know the theoretical spectrum):
$$x_{t} = 2.7607x_{t-1} - 3.8106x_{t-2} + 2.6535x_{t-3} - 0.9238x_{t-4}$$
library('TSA')
library('multitaper')
phi <- c(2.7607, -3.8106, 2.6535, -0.9238)
# our data
N <- 500 # number of data points
M <- 2048 # zeropadded length of series
freq <- seq(0, 0.5, by = 1/M)
x <- arima.sim(n = N, model = list(ar = phi))
# theoretical spectrum
spec.thry <- ARMAspec(model = list(ar = phi), plot = FALSE)
h.pgram <- rep(1/sqrt(N), N) #periodogram taper / window
# prepared data
xh.pgram <- x * h.pgram
# calculate the periodogram
spec.pgram <- abs(fft(c(xh.pgram, rep(0, M-N)))[1:(M/2+1)])^2
# multitaper - better solution very often
spec.mtm <- spec.mtm(x, nFFT = M, plot = FALSE)
# plotting
par(mar = c(4,4,1,1))
plot(spec.thry$freq, spec.thry$spec, type = 'l', log='y', ylab = "Spectrum", xlab = "Frequency", lwd = 2)
lines(freq, spec.pgram, col = rgb(1,0,0,0.7))
lines(spec.mtm$freq, spec.mtm$spec, col = rgb(0,0,1,0.7))
legend("topright", c("Theory", "Periodogram", "Multitaper"), col = c("black", rgb(1,0,0,0.7), rgb(0,0,1,0.7)), lwd = c(2,1,1))
Best Answer
Periodogram is one way of estimating the spectral density, perhaps, the simplest way:
There are many other ways to estimate the spectral density. In R
spectrum
call uses the same method that's in your first plot, but I think it additionally smoothes the data to achieve consistency