The question concerns calculating the correlation between two irregularly sampled time series (one-dimensional stochastic processes) and using that to find the time offset where they are maximally correlated (their "phase difference").
This problem is not usually addressed in time series analysis, because time series data are presumed to be collected systematically (at regular intervals of time). It is rather the province of geostatistics, which concerns the multidimensional generalizations of time series. The archetypal geostatistical dataset consists of measurements of geological samples at irregularly spaced locations.
With irregular spacing, the distances among pairs of locations vary: no two distances may be the same. Geostatistics overcomes this with the empirical variogram. This computes a "typical" (often the mean or median) value of $(z(p) - z(q))^2 / 2$--the "semivariance"--where $z(p)$ denotes a measured value at point $p$ and the distance between $p$ and $q$ is constrained to lie within an interval called a "lag". If we assume the process $Z$ is stationary and has a covariance, then the expectation of the semivariance equals the maximum covariance (equal to $Var(Z(p))$ for any $p$) minus the covariance between $Z(p)$ and $Z(q)$. This binning into lags copes with the irregular spacing problem.
When an ordered pair of measurements $(z(p), w(p))$ is made at each point, one can similarly compute the empirical cross-variogram between the $z$'s and the $w$'s and thereby estimate the covariance at any lag. You want the one-dimensional version of the cross-variogram. The R packages gstat and sgeostat, among others, will estimate cross-variograms. Don't worry that your data are one-dimensional; if the software won't work with them directly, just introduce a constant second coordinate: that will make them appear two-dimensional.
With two million points you should be able to detect small deviations from stationarity. It's possible the phase difference between the two time series could vary over time, too. Cope with this by computing the cross-variogram separately for different windows spaced throughout the time period.
@cardinal has already brought up most of these points in comments. The main contribution of this reply is to point towards the use of spatial statistics packages to do your work for you and to use techniques of geostatistics to analyze these data. As far as computational efficiency goes, note that the full convolution (cross-variogram) is not needed: you only need its values near the phase difference. This makes the effort $O(nk)$, not $O(n^2)$, where $k$ is the number of lags to compute, so it might be feasible even with out-of-the-box software. If not, the direct convolution algorithm is easy to implement.
Best Answer
Apply a lag operator on one time series, with the other fixed, and calculate the coherence of the cross-spectrum achieved against each lag. Find the lag that gives you the maximum coherence and interpret it.
Coherence is computed at each frequency-and hence is a vector. Hence, a sum of a weighted coherence would be a good measure. You would typically want to weight the coherences at frequencies that have a high energy in the power spectral density. That way, you would be measuring the similarities at the frequencies that dominate the time series instead of weighting the coherence with a large weight, when the content of that frequency in the time series is negligible.
http://www.stat.rutgers.edu/home/rebecka/Stat565/lab5-2007.pdf is a good link to look at to get started and http://www.atmos.washington.edu/~dennis/552_Notes_6c.pdf is an excellent introduction to cross-spectral analysis.