It looks like you've done all you could. The strata with the high non-response rates apparently do not constitute a large portion of the population. In retrospect, I would have suggested a smaller sample, with pilot tests, and more time devoted to follow-up of a random sample of non-responders.
Here are my thoughts on what you should do:
- Reweighted analysis
You should reweight to correct for non-response. Define
$$N_h = \text{number of institutions in stratum }h $$
and
$$
m_h = \text{number of responding institutions in stratum }h
$$
Then you can remove non-response bias related to stratum membership by running a survey program with weight defined for each institution in stratum $h$ as
$$
w_h = \dfrac{N_h}{m_h}$$
This will not remove non-response biases related to other, within-stratum, factors, but it's better than doing nothing.
- Do a stratified analysis with a survey program
The original sample size calculation apparently was the one appropriate for a simple random sample, not for a stratified random sample; i.e. it assumed that the estimated proportion would be the overall sample proportion.
Instead, you should use a survey analysis program that accepts stratum and weight information. Stata and SAS contain such programs. They will compute a stratified proportion, with an estimated standard error that will be smaller than that of the ordinary sample proprtion. You won't know exactly what the bound on error will be until you do the calculation.
- You can estimate a confidence interval for every stratum, but be aware that the relevant sample size is the number of responding observations ($m_h$) in the stratum. The average of these is 450/30 = 15, so some intervals will be very wide.
You can, of course, consider subsets of the population, including groups of strata or subsets defined by characteristics measured during the survey. Such subpopulation standard errors require a special formula, but every package with survey capabilities (e.g. Stata, SAS, Survey Package in R) will use it.
Added To answer your question about the sample to keep. The analysis will be based on the 450 responding institutions in the sample, but you will need to add information about the numbers of institutions in the population. You should keep the 250 non-responding institutions in the data set. They won't affect the analysis because the values of all measured variables will be missing You can also set the weight variable to zero or missing), but you need them to make a table describing response rates by stratum.
Best Answer
That is a very basic (and essential !) question in statistics. The maths behind the answer to this is the central limit theorem. It tells you that no matter what the law of probability is, the averages of N samples behave like a gaussian (the bound is not explicit unless you know the variance of your law).
In the problem you are mentionning you can do something more explicit since the law is rather simple (the law of one answer is called a Bernoulli, and the law of the sum is called a binomial. If p is the probability of "yes", then the variance for a N-sample is N p (1-p), and you can compute explicitly the N you need in order to make a mistake of less than say 5% with a 95% probability (you need both a margin of error and a trust interval for the questino to be well-posed).