Solved – Finding the similarity between two functions

functiongenetic algorithmskolmogorov-smirnov testsimilarities

I am a first-year grad student in Computer Science, and I need some help with a problem that I think is statistically oriented. I have taken a statistics course, but it was abysmal and I haven't had time to rectify that. But anyway, my problem stems from a project I'm working on involving genetic programming, where I'm randomly generating functions. Please bear with my description, as it's been a while since I've had a formal theory course too.

I have two continuous (but not onto) functions F and G, both of which map N variables to a single output. The domain of the input variables is the integers between -100 and 100. The range of the output is the Real numbers. I want to find some statistical measure of how "similar" the two functions are; given the finite inputs (of which there will be 201^N possible), how much variance(?) there is between the two functions outputs. Two identical functions should return no variance, and two wildly different functions should return a high variance.

Since N will typically be greater than 6, I can't iterate through all the possible inputs and compare the outputs, so I figured I could take some sampling at regular intervals (e.g. every multiple of 10, so that it's only 10^N). But here's about where I realize I have no idea what I'm doing. How do I determine if two numbers are "highly variant" from each other? What sample size do I need to use to have confidence in my results?

My current approach is to compare the functions with a two-sided Kolmogorov-Smirnov Test. Since that test doesn't seem to scale well to multi-variate problems, I've taken advantage of my limited domains to just treat the problem as having a single variable by concatenating my variables. So the first value of the variable is (-100:100:100:100:100:100), the second is (-100:100:100:100:100:099), and the last is (100:100:100:100:100:100). Does that even make sense?

Best Answer

Calculate a correlation of two functions over a set of random examples. The two-sided Kolmogorov-Smirnov test compares one-dimensional distributions, not multidimensional functions.