That is OK, and quite reasonable. It is referred to as the two-sample Kolmogorov-Smirnov test. Measuring the difference between two distribution functions by the supnorm is always sensible, but to do a formal test you want to know the distribution under the hypothesis that the two samples are independent and each i.i.d. from the same underlying distribution. To rely on the usual asymptotic theory you will need continuity of the underlying common distribution (not of the empirical distributions). See the Wikipedia page linked to above for more details.
In R, you can use the ks.test
, which computes exact $p$-values for small sample sizes.
Interesting problem. I have two thoughts, one general and one about how to characterize your data...
First, with respect to comparing distributions I agree with @Glen_b and @Scortchi that you do not want to compare Fly vs All as shown in your chart (but nice idea to overlay the plot of the D statistic). Because you have a strong belief about where the distributions are likely to be different, and not just that they are different, you might want to consider comparing quantiles of the two distributions. There is a nice blog post on the subject which works through R code to develop the testing method. And there is an R package, WRS, which implements quantile-based testing methods.
Second, I'd consider dropping the use of a formal comparison test altogether and instead use Weight of Evidence (WOE). This approach is commonly used in industries that need decision frameworks dealing with different levels of risk across various predictors. Examples include insurance underwriting, credit evaluation, and clinical trials.
In your setting there is a baseline "risk" of flight (you said 10%), but the odds of flight seem to increase greatly in the presence of ships at certain distances. Using the WOE approach you can convey the change in odds of flight as a function of a ships distance, which is easy to understand for lay audiences (well, at least easier than understanding p-values associated with test statistics). Note that this is closely related to @Scortchi's suggestion to use logistic regression, but with WOE you are not trying to fit a regression model.
There is nice documentation on Statistica's website for applying the method, but the best introduction I have found is in a book Credit Scoring, Response Modeling, and Insurance Rating: A Practical Guide to Forecasting Consumer Behavior. If you search on the term "WOE" you'll find multiple sections discussing the idea, and section 5.1 walks through a complete example of calculating WOE (it's pretty easy) and evaluating the results for decision-making. Finally, note that there is a stackoverflow post on this topic, which is not very developed, but there is a link to PDF walking through another example in the context of SAS coding.
Best Answer
Python implementation
I have written a python implementation using numpy. You can find the code here, you may find more infomation in the docstring in the code.
And here's another one (not by me). This Notebook provide a Python implementation for 2D K-S test with 2 samples. The
.py
file can be downloaded here. The code seems to be a straight translation ofC
code, the efficiency might be a problem if sample size is large.However you'd better check the codes (no matter which one) with the original papers/books before you use. The python implementations of 2d KS test are far less checked than the ones in R.
More infomation
The algorithm is first developed in two papers (as I see)
A nice introduction and the
C
implementation can be found inPress, W.H. et al. 1992, Numerical Recipes in C, Section 14.7, p645.
You can find
C++/Fortran
implementation in other versions of the book.Here's a post titled Beware the Kolmogorov-Smirnov test is also related to the subject, you may want to have a look. It encourages using resample method to evaluate the p-value with given KS distance.