I think the first thing you need to ensure is that you're not comparing apples to orangutans. Then we will discuss standard errors, statistical significance, and model selection.
Here's how you might compare OLS/LPM and logit coefficients for dummydummy interactions. We will model union membership as a function of race and education (both categorical) for US women from the NLS88 survey.
First, we will use OLS with factor variable notation for the interactions:
. sysuse nlsw88, clear
(NLSW, 1988 extract)
. reg union i.race##i.collgrad
Source  SS df MS Number of obs = 1878
+ F( 5, 1872) = 7.02
Model  6.40214176 5 1.28042835 Prob > F = 0.0000
Residual  341.434386 1872 .182390164 Rsquared = 0.0184
+ Adj Rsquared = 0.0158
Total  347.836528 1877 .185315146 Root MSE = .42707

union  Coef. Std. Err. t P>t [95% Conf. Interval]
+
race 
black  .0799445 .0250534 3.19 0.001 .0308089 .1290801
other  .1157454 .1076307 1.08 0.282 .0953433 .3268342

collgrad 
college grad  .0975234 .0261143 3.73 0.000 .0463072 .1487395

race#collgrad 
black#college grad  .0415079 .0563381 0.74 0.461 .0689841 .152
other#college grad  .0350234 .1867622 0.19 0.851 .4013073 .3312606

_cons  .1967546 .0136007 14.47 0.000 .1700804 .2234288

For instance, black women who also graduated from college are 4.15 percentage points more likely to be in a union.
Now we fit a logit model:
. logit union i.race##i.collgrad, nolog
Logistic regression Number of obs = 1878
LR chi2(5) = 33.33
Prob > chi2 = 0.0000
Log likelihood = 1029.9582 Pseudo R2 = 0.0159

union  Coef. Std. Err. z P>z [95% Conf. Interval]
+
race 
black  .4458082 .1361797 3.27 0.001 .178901 .7127154
other  .6182459 .5452764 1.13 0.257 .4504762 1.686968

collgrad 
college grad  .5320064 .1397767 3.81 0.000 .2580491 .8059637

race#collgrad 
black#college grad  .0885629 .2791468 0.32 0.751 .4585548 .6356807
other#college grad  .2543746 .918575 0.28 0.782 2.054748 1.545999

_cons  1.406703 .0801078 17.56 0.000 1.563712 1.249695

The logit index function coefficients are not particularly meaningful since they are not effects on the probability of union membership. The sign and the significance might tell you something, but the magnitude of the effect is not clear. Also note that the standard errors are large, like in your own data. For instance, the SE of the college graduate of other race coefficient is almost 1.
To get something comparable to OLS, we will use margins
with the contrast operator:
. margins r.race##r.collgrad
Contrasts of predictive margins
Model VCE : OIM
Expression : Pr(union), predict()

 df chi2 P>chi2
+
race 
(black vs white)  1 14.34 0.0002
(other vs white)  1 1.20 0.2725
Joint  2 15.14 0.0005

collgrad  1 19.09 0.0000

race#collgrad 
(black vs white) (college grad vs not college grad)  1 0.44 0.5085
(other vs white) (college grad vs not college grad)  1 0.03 0.8666
Joint  2 0.48 0.7869


 Deltamethod
 Contrast Std. Err. [95% Conf. Interval]
+
race 
(black vs white)  .0901999 .0238201 .0435134 .1368864
(other vs white)  .1070922 .0976013 .0842029 .2983873

collgrad 
(college grad vs not college grad)  .108149 .0247526 .0596347 .1566633

race#collgrad 
(black vs white) (college grad vs not college grad)  .041508 .0627785 .0815355 .1645515
(other vs white) (college grad vs not college grad)  .0350233 .2084485 .4435749 .3735282

These are pretty close to the OLS effects. For instance, black women who graduated from college are also 4.15 percentage points more likely to be in a union according to the logit model. The SEs are somewhat smaller.
Sometimes you can't run the margins command because you don't have the data. All you have are the logit coefficients from someone's paper. While I said they were not particularly meaningful in their raw form, you can transform the logit index function coefficients into a multiplicative effect by exponentiating them, which is easy enough with a calculator. For example, the index function coefficient for black college graduates was .0885629. If I exponentiate it, I get $\exp(.0885629)=1.092603$. This tells me that black college graduates are 1.09 times more likely to be union members compared to a baseline of $\exp(1.406703)=0.24494955$ (the baseline is the exponentiated constant from the logit). So this means that the union rate for black college graduates will be $0.24\cdot 1.09$ or about $26$%. OLS and logit with margins, will give the additive effect, so there we get about $19.67+4.15=23.87$. That's pretty darn close. It won't always work out so nicely.
Stata will give you exponentiated coefficients when you specify odds ratios option or
:
. logit union i.race##i.collgrad, or nolog
or just use logistic
:
. logistic union i.race##i.collgrad, nolog
I learned about these tricks from Maarten L. Buis. There are lots of examples with interactions of various sorts and nonlinear models at that link.
In my toy example, I did not cluster my errors, but that doesn't change the main thrust of these results. Some people don't like clustered standard errors in logit/probits because if the model's errors are heteroscedastic the parameter estimates are inconsistent.
After that long detour, we finally get to statistical significance. In all the models above (OLS, logit index function, logit margins, and OR logit), all the interactions are statistically insignificant (though the main effects generally are not). The standard errors are large compared to the estimates, so the data is consistent with the effects on all scales being zero (the confidence intervals include zero in the additive case and 1 in the multiplicative). If we surveyed enough women, it is possible that we would be able to detect some statistically significant interactions. The statistical significance depends in part on the sample size. If you don't have too many Bhutanese students in your data, it will be hard to detect even the main effect, much less the foreign friends interaction. On the other hand, if the effect is huge, you might be able to detect it with only a few students. Perhaps you can try grouping students by continent instead of country, though too much datadriven variable transformation is to be avoided.
Generally, OLS and nonlinear models will give you similar results. If they don't, as may be the case with your data, I think you should report both and let you audience pick. Some people believe OLS/LPM is more robust to departures from assumptions (like heteroscedasticity), others disagree vehemently. You can and should justify a preferred model in various ways, but that's a whole question in itself. Personally, I would report both clustered OLS and nonclustered logit marginal effects (unless there's little difference between the clustered and nonclustered versions). You can also use an LM test to rule out heteroscedasticity.
Finally, with dummydummy interactions, I believe the sign and the significance of the index function interaction corresponds to the sign and the significance of the marginal effects. For continuouscontinuous interactions (and perhaps continuousdummy as well), that is generally not the case in nonlinear models like the logit.
Yes.
We have a dummy variable regression in which (without loss of generality) the last category serves as the base case,
$$
y_i=\alpha+\sum_{j=1}^{p1}\beta_jD_{ij}+u_i
$$
Let there be $n$ observations in total, with $n_j$ observations such that $D_{ij}=1$. We need to look at what happens to the formula for the variance of the regression coefficients, $\sigma^2(X'X)^{1}$. Here, the regressor matrix $X$ has unit entries in the first column and another unit entry in column $j+1$ if observation $i$ belongs to group $j$ (unless it belongs to group $p$).
Consider $X'X$, which can be written as a block matrix
$$
X'X=\begin{pmatrix}
A&B\\
B'&D
\end{pmatrix}
$$
where $A=n$, $B=(n_1,\cdots,n_{p1})$ and $D$ a diagonal matrix with main diagonal $B$. This follows by direct multiplication, exploiting that no row of $X$ has more than one entry equal to one (except for the constant column)
To obtain $(X'X)^{1}$, we use the formula for block inverses,
\begin{align}
(X'X)^{1} &= \begin{pmatrix}
\hspace{2cm}A &\hspace{4.3cm}B\\
\hspace{2cm}B' &\hspace{7cm}D
\hspace{2.8cm}\end{pmatrix}^{1} \\[5pt]
&=\begin{pmatrix}
(ABD^{1}B')^{1}&(ABD^{1}B')^{1}BD^{1}\\
D^{1}B'(ABD^{1}B')^{1}&D^{1}+D^{1}B'(ABD^{1}B')^{1}BD^{1}
\end{pmatrix}
\end{align}
The inverse of the diagonal matrix $D$ simply is a diagonal matrix with entries $1/n_j$. Direct multiplication then yields
$$
(ABD^{1}B')^{1}=\left(n\sum_{j=1}^{p1}n_j\right)^{1}=\frac{1}{n_p}
$$
Further, $BD^{1}=\iota'$, a unit row vector, and hence $D^{1}B'=\iota$. Putting things together gives
\begin{align}
(X'X)^{1} &=\begin{pmatrix}
\hspace{.5cm}A &\hspace{.7cm}B\\
\hspace{.5cm}B' &\hspace{1.5cm}D
\hspace{.8cm}\end{pmatrix}^{1} \\[5pt]
&=\begin{pmatrix}
\frac{1}{n_p}&\frac{1}{n_p}\iota'\\
\frac{1}{n_p}\iota'&D^{1}+\frac{1}{n_p}\iota\iota'
\end{pmatrix}
\end{align}
This means that all offdiagonal elements of the variancecovariance matrix only depend on $1/n_p$, which you know from the first squared standard error.
In fact, the derivation hence shows a little more than what you asked: to get the offdiagonal elements, you do not even need to all variances, but only that of the base category.
Here is a little numerical illustration.
n < 100
y < rnorm(n) # this is the dependent variable
p < 5
X < matrix(0, nrow=n, ncol=p)
X[cbind(1:n, sample(1:p, n, replace=T))] < 1 # insert an 1 into one of the columns for each row
reg3 < summary(lm(y~X[,1:(p1)])) # regression omitting the pth category
vcov(reg3)
(Intercept) X[, 1:(p  1)]1 X[, 1:(p  1)]2 X[, 1:(p  1)]3 X[, 1:(p  1)]4
(Intercept) 0.05463828 0.05463828 0.05463828 0.05463828 0.05463828
X[, 1:(p  1)]1 0.05463828 0.14440116 0.05463828 0.05463828 0.05463828
X[, 1:(p  1)]2 0.05463828 0.05463828 0.13318080 0.05463828 0.05463828
X[, 1:(p  1)]3 0.05463828 0.05463828 0.05463828 0.10927656 0.05463828
X[, 1:(p  1)]4 0.05463828 0.05463828 0.05463828 0.05463828 0.10699996
Best Answer
While @Kjetil is of course right that there is nothing special about s.e.s in a dummy variable regression, it may be instructive to look at how the expressions look like explicitly.
Take the model (which slightly differs from yours in that there is a full set of dummies rather than an intercept and dummies for all but one category) $$ y_i=\beta_1D_{1i}+\beta_2D_{2i}+u_i $$ where the dummies are such that $D_{1i}+D_{2i}=1$ for all $i=1,\ldots,n$.
Let there be $n_1$ observations such that $D_{1i}=1$ and $n_2$ such that $D_{2i}=1$. Then, the formula for the variance of the regression coefficients, $\sigma^2(X'X)^{1}$, simplifies to $$ Var(\hat\beta)=\begin{pmatrix}\sigma^2/n_1&0\\0&\sigma^2/n_2\end{pmatrix}, $$ which are indeed nothing but the respective variances of the sample means of the $y_i$ belonging to the two different groups. The offdiagonal entries must be zero as the second regressor always has a zero entry when the first has a unit entry, so when computing the offdiagonal entries of $X'X$, we must multiply zeros and ones.
When we have one unit column and one dummy (here, $D_{1i}$), $X'X$ will become $$ \begin{pmatrix}n&n_1\\n_1&n_1\end{pmatrix} $$ so that $$ (X'X)^{1}=\frac{1}{nn_1n_1^2}\begin{pmatrix}n_1&n_1\\n_1&n\end{pmatrix} $$ or $$ (X'X)^{1}=\begin{pmatrix}\frac{1}{n_2}&\frac{1}{n_2}\\\frac{1}{n_2}&\frac{n}{nn_1n_1^2}\end{pmatrix} $$ Hence, the offdiagonal is no longer zero, but basically cancels out the variance of the baseline category. I am not so sure what to make of this finding, but note that this implies that the variance of the sum of coefficients in the regression with intercept, $$ Var(\hat\beta_0+\hat\beta_1)=\frac{1}{nn_1}2\frac{1}{nn_1}+\frac{n}{nn_1n_1^2}, $$ equals the variance of $D_{1i}$, $1/n_1$, in the regression with two dummies.
Here is a little numerical illustration of these ideas.