The type 1 / false rejection error rate $\alpha=.05$ isn't completely arbitrary, but yes, it is close. It's somewhat preferable to $\alpha=.051$ because it's less cognitively complex (people like round numbers and multiples of five). It's a decent compromise between skepticism and practicality, though maybe a little outdated – modern methods and research resources may make higher standards (i.e., lower $p$ values) preferable, if standards there must be (Johnson, 2013).
IMO, the greater problem than the choice of threshold is the often unexamined choice to use a threshold where it is not necessary or helpful. In situations where a practical choice has to be made, I can see the value, but much basic research does not necessitate the decision to dismiss one's evidence and give up on the prospect of rejecting the null just because a given sample's evidence against it falls short of almost any reasonable threshold. Yet much of this research's authors feel obligated to do so by convention, and resist it uncomfortably, inventing terms like "marginal" significance to beg for attention when they can feel it slipping away because their audiences often don't care about $p$s $\ge.05$. If you look around at other questions here on $p$ value interpretation, you'll see plenty of dissension about the interpretation of $p$ values by binary fail to
/reject
decisions regarding the null.
Completely different – no. Meaningfully different – maybe. One reason to show a ridiculously small $p$ value is to imply information about effect size. Of course, just reporting effect size would be much better for several technical reasons, but authors often fail to consider this alternative, and audiences may be less familiar with it as well, unfortunately. In a null-hypothetical world where no one knows how to report effect sizes, one may be right most often in guessing that a smaller $p$ means a larger effect. To whatever extent this null-hypothetical world is closer to reality than the opposite, maybe there's some value in reporting exact $p$s for this reason. Please understand that this point is pure devil's advocacy...
Another use for exact $p$s that I've learned by engaging in a very similar debate here is as indices of likelihood functions. See Michael Lew's comments on and article (Lew, 2013) linked in my answer to "Accommodating entrenched views of p-values".
I don't think the Bonferroni correction is the same kind of arbitrary really. It corrects the threshold that I think we agree is at least close-to-completely arbitrary, so it doesn't lose any of that fundamental arbitrariness, but I don't think it adds anything arbitrary to the equation. The correction is defined in a logical, pragmatic way, and minor variations toward larger or smaller corrections would seem to require rather sophisticated arguments to justify them as more than arbitrary, whereas I think it would be easier to argue for an adjustment of $\alpha$ without having to overcome any deeply appealing yet simple logic in it.
If anything, I think $p$ values should be more open to interpretation! I.e., whether the null is really more useful than the alternative ought to depend on more than just the evidence against it, including the cost of obtaining more information and the added incremental value of more precise knowledge thusly gained. This is essentially the Fisherian no-threshold idea that, AFAIK, is how it all began. See "Regarding p-values, why 1% and 5%? Why not 6% or 10%?"
If fail to
/reject
crises aren't forced upon the null hypothesis from the outset, then the more continuous understanding of statistical significance certainly does admit the possibility of continuously increasing significance. In the dichotomized approach to statistical significance (I think this is sometimes referred to as the Neyman-Pearson framework; cf. Dienes, 2007), no, any significant result is as significant as the next – no more, no less. This question may help explain that principle: "Why are p-values uniformly distributed under the null hypothesis?" As for how many zeroes are meaningful and worth reporting, I recommend Glen_b's answer to this question: "How should tiny $p$-values be reported? (and why does R put a minimum on 2.22e-16?)" – it's much better than the answers to the version of that question you linked on Stack Overflow!
References
- Johnson, V. E. (2013). Revised standards for statistical evidence. Proceedings of the National Academy of Sciences, 110(48), 19313–19317. Retrieved from http://www.pnas.org/content/110/48/19313.full.pdf.
- Lew, M. J. (2013). To P or not to P: On the evidential nature of P-values and their place in scientific inference. arXiv:1311.0081 [stat.ME]. Retrieved from http://arxiv.org/abs/1311.0081.
Are smaller $p$-values "more convincing"? Yes, of course they are.
In the Fisher framework, $p$-value is a quantification of the amount of evidence against the null hypothesis. The evidence can be more or less convincing; the smaller the $p$-value, the more convincing it is. Note that in any given experiment with fixed sample size $n$, the $p$-value is monotonically related to the effect size, as @Scortchi nicely points out in his answer (+1). So smaller $p$-values correspond to larger effect sizes; of course they are more convincing!
In the Neyman-Pearson framework, the goal is to obtain a binary decision: either the evidence is "significant" or it is not. By choosing the threshold $\alpha$, we guarantee that we will not have more than $\alpha$ false positives. Note that different people can have different $\alpha$ in mind when looking at the same data; perhaps when I read a paper from a field that I am skeptical about, I would not personally consider as "significant" results with e.g. $p=0.03$ even though the authors do call them significant. My personal $\alpha$ might be set to $0.001$ or something. Obviously the lower the reported $p$-value, the more skeptical readers it will be able to convince! Hence, again, lower $p$-values are more convincing.
The currently standard practice is to combine Fisher and Neyman-Pearson approaches: if $p<\alpha$, then the results are called "significant" and the $p$-value is [exactly or approximately] reported and used as a measure of convincingness (by marking it with stars, using expressions as "highly significant", etc.); if $p>\alpha$ , then the results are called "not significant" and that's it.
This is usually referred to as a "hybrid approach", and indeed it is hybrid. Some people argue that this hybrid is incoherent; I tend to disagree. Why would it be invalid to do two valid things at the same time?
Further reading:
Best Answer
I think there is not much wrong in saying that the results are "highly significant" (even though yes, it is a bit sloppy).
It means that if you had set a much smaller significance level $\alpha$, you would still have judged the results as significant. Or, equivalently, if some of your readers have a much smaller $\alpha$ in mind, then they can still judge your results as significant.
Note that the significance level $\alpha$ is in the eye of the beholder, whereas the $p$-value is (with some caveats) a property of the data.
Observing $p=10^{-10}$ is just not the same as observing $p=0.04$, even though both might be called "significant" by standard conventions of your field ($\alpha=0.05$). Tiny $p$-value means stronger evidence against the null (for those who like Fisher's framework of hypothesis testing); it means that the confidence interval around the effect size will exclude the null value with a larger margin (for those who prefer CIs to $p$-values); it means that the posterior probability of the null will be smaller (for Bayesians with some prior); this is all equivalent and simply means that the findings are more convincing. See Are smaller p-values more convincing? for more discussion.
The term "highly significant" is not precise and does not need to be. It is a subjective expert judgment, similar to observing a surprisingly large effect size and calling it "huge" (or perhaps simply "very large"). There is nothing wrong with using qualitative, subjective descriptions of your data, even in the scientific writing; provided of course, that the objective quantitative analysis is presented as well.
See also some excellent comments above, +1 to @whuber, @Glen_b, and @COOLSerdash.