What you conclude about if data is IID comes from outside information, not the data itself. You as the scientist need to determine if it is a reasonable to assume the data IID based on how the data was collected and other outside information.
Consider some examples.
Scenario 1: We generate a set of data independently from a single distribution that happens to be a mixture of 2 normals.
Scenario 2: We first generate a gender variable from a binomial distribution, then within males and females we independently generate data from a normal distribution (but the normals are different for males and females), then we delete or lose the gender information.
In scenario 1 the data is IID and in scenario 2 the data is clearly not Identically distributed (different distributions for males and females), but the 2 distributions for the 2 scenarios are indistinguishable from the data, you have to know things about how the data was generated to determine the difference.
Scenario 3: I take a simple random sample of people living in my city and administer a survey and analyse the results to make inferences about all people in the city.
Scenario 4: I take a simple random sample of people living in my city and administer a survey and analyze the results to make inferences about all people in the country.
In scenario 3 the subjects would be considered independent (simple random sample of the population of interest), but in scenario 4 they would not be considered independent because they were selected from a small subset of the population of interest and the geographic closeness would likely impose dependence. But the 2 datasets are identical, it is the way that we intend to use the data that determines if they are independent or dependent in this case.
So there is no way to test using only the data to show that data is IID, plots and other diagnostics can show some types of non-IID, but lack of these does not guarantee that the data is IID. You can also compare to specific assumptions (IID normal is easier to disprove than just IID). Any test is still just a rule out, but failure to reject the tests never proves that it is IID.
Decisions about whether you are willing to assume that IID conditions hold need to be made based on the science of how the data was collected, how it relates to other information, and how it will be used.
Edits:
Here are another set of examples for non-identical.
Scenario 5: the data is residuals from a regression where there is heteroscedasticity (the variances are not equal).
Scenario 6: the data is from a mixture of normals with mean 0 but different variances.
In scenario 5 we can clearly see that the residuals are not identically distributed if we plot the residuals against fitted values or other variables (predictors, or potential predictors), but the residuals themselves (without the outside info) would be indistinguishable from scenario 6.
The Kolmogorov-Smirnov can still be used, but if you use the tabulated critical values it will be conservative (which is only a problem because it pushes down your power curve). Better to get the permutation distribution of the statistic, so that your significance levels are what you choose them to be. This will only make a big difference if there are a lot of ties. This change is really easy to implement. (But the K-S test isn't the only possible such comparison; if one is computing permutation distributions anyway, there are other possibilities.)
vanilla chi-square goodness of fit tests for discrete data are generally, to my mind, a really bad idea. If the above potential loss of power stopped you using the K-S test, the problem with the chi-square is often much worse - it throws out the most critical information, which is the ordering among the categories (the observation values), deflating its power by spreading it across alternatives that don't consider the ordering, so that it's worse at detecting smooth alternatives -- like a shift of location and scale for example). Even with the bad effects of heavy ties above, the KS test in many cases still have better power (while still lowering the Type I error rate).
The chi-square can also be modified to take account of the ordering (partition the chisquare into linear, quadratic, cubic etc components via orthogonal polynomials and use only the low order few terms - 4 to 6 are common choices). Papers by Rayner and Best (and others) discuss this approach, which arises out of Neyman-Barton smooth tests. This is a good approach but if you don't have access to software for it, it may take a little setting up.
Either modified approach should be fine, but if you're not going to modify either approach, it's not necessarily the case that the chi-square will be better than the KS test -- in some situations it might be better ... or it may be substantially worse.
If the ties are light (i.e. there are lots of different values taken by the data), I'd consider the KS as-is. If they're moderate, I'd look to calculate the permutation distribution. If they're very heavy (i.e. the data only take a few different values), the plain chi-square may be competitive.
Best Answer
Statistical distributional tests of two samples of
0
s and1
s should not be done via ks-test but can easily done via Fisher's exact test or Chi-Squared-Test-of-Independence. I am not familiar enough with Python, but basically if you sample 80:20 from one group and 90:10 from the other that can be displayed in a contingency table:And these are usually tested via on of tests mentioned above.
Note that these tests will answer whether there is significance which is "enough proof of difference". What is considered "enough evidence" depends heavily on the sample sizes you take.
You have not explained why you want to investigate this but you might want to consider plotting the distribution of odds ratios or something else more meaningfull then $p$-values.
The number $n$ of resamples is high enough, if many resamplings of size $n$ deliver return values that are close enough. Choose a sample size $n$, e. g. $n = 500$, resample 10 times with $n$ resamples and judge depending on the range of return values, if that was precise enough for you. If not of if in doubt, increase $n$ substantially and start over.