The mathematical definition is very easy. Two events $A$ and $B$ are independent if and only if $$P(A\cap B) = P(A)P(B).$$
In "pure" probability theory there's no interpretation of this, it's just a definition. It's a purely mathematical statement I can make about two events and a probability distribution.
To explain what it "means" you have to explain what probability means. There's no acceptable answer to this question. It's a big philosophical problem that mathematicians avoid by writing down some equations and solving them.
The motivation comes from the idea of conditional probability.
Suppose you throw a die. The probability you throw a six is $\frac 16$ and the probability you throw an even number is $\frac 12$. You can check with the formula above that the two events are not independent.
To get an idea of why suppose you throw a die but don't look at it. You want to get a six. I tell you whether or not it's even and you decide whether to keep it or roll it again. If I tell you it's odd then you know it's not a six and you roll it again. If I tell you it's even then there are only three numbers it could be and one of them is a six. So the probability that you got a six is now one in three. So you'd be crazy to throw again.
In maths we define conditional probabilities as follows$$P(A|B) = \frac{P(A\cap B)}{P(B)}.$$
Again in "pure" maths there's no interpretation of this, it's just a formula.
But in the real world $P(A|B)$ is the probability that $A$ happens if you already know that $B$ happened.
So the interpretation of independence is that $A$ and $B$ are independent if and only if $P(A|B) = P(A)$ if you know that $B$ happened it doesn't affect the probability that $A$ happened.
This concept makes intuitive sense to people. If my team is winning at half time it's more likely to win the game than if it wasn't, so not independent. If my team is winning at half time it doesn't make it less likely that it's going to rain tomorrow, so independent.
It's worth noting though that independence is an assumption that I might be wrong about. If my team happens to play well in the rain and they're winning at half time it's more likely to be raining during the game. This might mean it's more likely to be raining tomorrow. So they might not be independent.
So in fact a better definition of independence would be an assumption I make to simplify my model which is usually wrong, but hopefully not that wrong.
"Is this the same as saying these two events are dependent?".
No.
If $A$ denotes some fixed event with $P(A)>0$ then it induces a new probability measure on the same collection of events. At first hand for any event $B$ we are interested in the map $B\mapsto P(A\cap B)$. However this map is not in general a probability measure because it sends the outcome set $\Omega$ to $P(A)$ and it is not excluded that $P(A)\neq1$. To repair this we divide by $P(A)$ and the function $P_A$ on events prescribed by $B\mapsto P(A\cap B)/P(A)$ is a probability measure. Instead of $P_A(B)$ we use a different notation: $P(B\mid A)$. This is how the conditional probability measure wrt event $A$ is "born". It answers the question:
"What is the probability that event $B$ occurs under the extra condition that event $A$ occurs?"
Independence comes in if we observe that $P(B\mid A)$ and $P(B)$ appear to be the same. That happens iff $P(A\cap B)=P(A)P(B)$ which means exactly that $A$ and $B$ are independent events.
What is asked above can in that case indirectly be answered with:
"Occurrence (or non-occurrence) of $A$ has no effect on the probability of occurrence of $B$ so the probability that $B$ occurs under this condition is still $P(B)$. More shortly: $P(B\mid A)=P(B)$.
Best Answer
Why do we define independence in a way that allows for an event of probability zero to be independent of another event?
The case you are referring to can be seen as degenerate. If one event has probability $0$ then there isn't a clear way to interpret the notion of independence intuitively (in order to know if an event occurring has some impact on another event, we first need that event to be possible). The reason that we define independence in the way that we do is simply out of convenience (as you speculate in the question). It is convenient to have one notion of independence so that we can apply lots of results that apply to independent events in as broad a way as possible. If we were to exclude the cases that you refer to, then this would just create additional work to prove that the standard results that follow for independent events also work in the case that you have excluded.
What would the consequences be if we modified the definition?
If we were to change the definition now, then there would be some real consequences. For example, in the standard proof of Kolmogorov's Zero-One Law, we use the fact that if the event $A$ is independent of itself then the probability of $A$ must be $0$ or $1$ (in other words: $P(A \cap A) = P(A)P(A)$ if and only if $P(A) = 0$ or $P(A)=1$) - this proof can be found in the book "Measure Theory" by Donald Cohn.
If we were to modify the definition of independence in the way that you suggest, then this proof breaks down as the event $A$ can no longer said to be independent. It is, of course, still possible to modify this proof, but it becomes unnecessarily complicated because we can no longer refer to the notion of $A$ being independent and we can no longer apply any of the standard results that follow from independence either (without additional justification).
This is, of course, one example, but there are many other proofs that use similar ideas to the one above and so the definition of independence that we currently have lends itself nicely to simplifying these types of proofs. The only downside is (as you also pointed out), that you lose some intuition when you think about these type of events logically. However, as examples involving events of this nature are degenerate, this isn't a big concern in the mathematical community.
Why do some definitions exclude edge cases?
There are, of course, some definitions that do exclude edge cases. However, there are usually much more serious reasons for this.
For example, one could ask why $1$ is not a prime number. We could easily have allowed $1$ to be a prime and number, and to some, this might be more intuitive (like in this case).
However, if we did modify the definition of prime numbers to include the number $1$, then we would run into problems. Numbers would no longer have unique prime factorisations
$$6 = \color{red}{3 \times2} \space = \space \color{blue}{3 \times 2 \times 1}\space = \space \color{green}{3 \times 2 \times 1 \times 1} = \space \space ... $$
This would create a lot more work for mathematicians and would also violate the Fundamental Theorem of Arithmetic (requiring it to be rewritten).
Therefore, in some cases, like this one, it is sensible to exclude an "edge case". However, in the case you describe, there are no serious ramifications of including $0$. Therefore, from the perspective of convenience it makes sense to include the degenerate cases to simplify our analyses and proofs.