The boxplot and histogram tell you all by themselves that your data are skewed, especially in group A. The Shapiro-Wilk test is kind of pointless. With data thusly skewed the ANOVA isn't really appropriate. The Kruskal-Wallis rank sum test is based on the ranks, not the absolute values and doesn't require normality, either in the measures or residuals. It is the more appropriate test.
A quick Google search will tell you one requires normality and one does not.
One thing you might consider is that durations are an arbitrary representation of time. For example, you can indicate the duration of an event as 2 s or you can say the event has a rate 0.5 events/s. It's the exact same thing and both numbers can arbitrarily be interchanged for representation. However, rates tend to be much less skewed and more appropriate for statistical analysis. It's possible your rates are normally distributed and you can use ANOVA in that case.
If you do decide to look at rates, keep in mind that the direction of magnitude changes, a higher duration values = a lower rate value. Some people use a negative rate just to avoid that confusion.
As ttnphns commented, neither Kruskal-Wallis nor rank sum tests have any assumptions about distributional similarity between groups. There is a point of confusion that somtimes arises in these tests because, while in the most general sense they are tests for stochastic dominance (e.g., H$_{0} \text{: P}(X_{A} > X_{B}) = \frac{1}{2})$, with two additional assumptions—(1) that the distributions are the same shape, and (2) that any differences between the distributions of the groups are differences of central location—the tests can be interpreted as tests for median difference (e.g., H$_{0} \text{: } \tilde{x}_{A} = \tilde{x}_{b}$).
Therefore, significance is not an issue, and there is nothing to "mitigate." However, substantive interpretation (i.e. stochastic dominance versus median, mean, etc. difference) will entail.
Best Answer
I don't think the statement in the quote is accurate.
The Kruskal-Wallis is effectively a test for at least one variable being stochastically larger than at least one other, which doesn't require identity of shape. Indeed, even if it was being used as a test of identical distribution under the null, it would only be necessary for the shape to be identical under the null; if the null is false, there's still no requirement for the shape to be the same then.
If, however, one was looking specifically at say a location shift alternative, in order to use it specifically as say a test of location difference (a test of medians, or of means, or of tenth percentiles or ... against a shift in the same) then the shapes would then be assumed the same under both null and alternative in order that rejection of the null implied that location shift.