The statistical term deviance is thrown around a bit too much. Most of the time, programs return the deviance
$$ D(y) = -2 \log{\{p(y | \hat{\theta})\}},$$
where $\hat{\theta}$ is your estimated parameter(s) from model fitting and $y$ is some potentially observed/observable occurrence of the random quantity in question.
The more common deviance that you refer to would treat the deviance above as a function of two variables, both the data and the fitted parameters: $$ D(y,\hat{\theta}) = -2\log{\{p(y|\hat{\theta})\}}$$
and so if you had one $y$ value but two competing, fitted parameter values, $\hat{\theta}_{1}$ and $\hat{\theta}_{2}$, then you'd get the deviance you mentioned from $$-2(\log{\{p(y|\hat{\theta}_{1})\}} - \log{\{p(y|\hat{\theta}_{2})\}}).
$$
You can read about the Matlab function that you mentioned, glmfit()
, linked here. A more fruitful, though shorter, discussion of the deviance is linked here.
The deviance statistic implicitly assumes two models: the first is your fitted model, returned by glmfit()
, call this parameter vector $\hat{\theta}_{1}$. The second is the "full-model" (also called the "saturated model"), which is a model in which there is a free variable for every data point, call this parameter vector $\hat{\theta}_{s}$. Having so many free variables is obviously a stupid thing to do, but it does allow you to fit to that data exactly.
So then, the deviance statistics is computed as the difference between the log likelihood computed at the fitted model and the saturated model. Let $Y=\{y_{1}, y_{2}, \cdots, y_{N}\}$ be the collection of the N data points. Then:
$$DEV(\hat{\theta}_{1},Y) = -2\biggl[\log{p(Y|\hat{\theta}_{1})} - \log{p(Y|\hat{\theta}_{s})} \biggr]. $$
The terms above will be expanded into summations over the individual data points $y_{i}$ by the independence assumption. If you want to use this computation to calculate the log-likelihood of the model, then you'll need to first calculate the log-likelihood of the saturated model. Here is a link that explains some ideas for computing this... but the catch is that in any case, you're going to need to write down a function that computes the log-likelihood for your type of data, and in that case it's probably just better to create your own function that computes the log-likelihood yourself, rather than backtracking it out of a deviance calculation.
See Chapter 6 of Bayesian Data Analysis for some good discussion of deviance.
As for your second point about the likelihood test statistic, yes it sounds like you basically know the right thing to do. But in many cases, you'll consider the null hypothesis to be something that expert, external knowledge lets you guess ahead of time (like some coefficient being equal to zero). It's not necessarily something that comes as the result of doing model fitting.
As Stijn pointed out, the k-s test returns a D statistic and a p-value corresponding to the D statistic. The D statistic is the absolute max distance (supremum) between the CDFs of the two samples. The closer this number is to 0 the more likely it is that the two samples were drawn from the same distribution. Check out the Wikipedia page for the k-s test. It provides a good explanation: https://en.m.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test
The p-value returned by the k-s test has the same interpretation as other p-values. You reject the null hypothesis that the two samples were drawn from the same distribution if the p-value is less than your significance level. You can find tables online for the conversion of the D statistic into a p-value if you are interested in the procedure.
Best Answer
Simply compare the p-value to your desired significance level. If your p-value is less than (or equal to) your significance level (your chosen type I error rate, $\alpha$), you should reject the null hypothesis. (You may need to brush up your understanding of how hypothesis testing works.)
If you mean you want to combine information across many days, it depends on whether the days are going to share a distribution (within the two different groups of things being compared in the test) or not, but one approach that works in either case would be to test the distribution of p-values for uniformity against the alternative that it's typically smaller. That would give an overall test that would apply over many days. However, if you're testing every day, you may want to consider the properties of such a procedure.
No. If you don't have continuous distributions you probably shouldn't be doing a KS test at all; it won't have the usual properties (e.g. type I error rates will be too low, power will be low).