- can someone generalize it, so as to make my understanding more clear? Say for $n$ events?
If $(B_k)_n$ is a sequence of $n$ events that partition the sample space (or if at least $(B_k\cap A_1)_n$ partitions $A_1$) then, $\mathsf P(A_2\mid A_1) = \sum_{k=1}^n \mathsf P(A_2\mid A_1\cap B_k)\mathsf P(B_k\mid A_1)$
- Also, in $P(A_2|A_1)=P(A_2|\color{red}{AA_1})P(\color{red}{A|A_1})+P(A_2|\color{magenta}{A^cA_1})P(\color{magenta}{A^c|A_1})$, I feel red colored stuff should be same and pink colored stuff should be same, as in case of simple form law of total probability.
They are not the same in the case of the simple form. So why should they be?
Where $\Omega$ is the entire sample space, then:
$${{\mathsf P(A_2)~}{= \mathsf P(A_2\mid \Omega)\\=\mathsf P(A_2\mid \color{red}{A}, \Omega)P(\color{red}{A}\mid \Omega)+\mathsf P(A_2\mid \color{magenta}{A^c}, \Omega)\,\mathsf P(\color{magenta}{A^c}\mid \Omega)\\=\mathsf P(A_2\mid \color{red}{A})P(\color{red}{A})+\mathsf P(A_2\mid \color{magenta}{A^c})\,\mathsf P(\color{magenta}{A^c})}}$$
- I felt it should be $P(A_2|\color{red}{(A_1|A)})P(\color{red}{A_\,\mathsf 1|A})+P(A_2|\color{magenta}{(A_1|A^c)})P(\color{magenta}{A_1|A^c})$. Am I absolutely stupid here?
:) Well, I would not say absolutely. But seriously, it is a rather common misunderstanding.
The conditioning bar is not a set operation. It seperates the event from the condtion that the probability function is being measured over. There can only be one inside any probability function; they do not nest.
- For a moment I felt its related to:$P(E_1E_2E_2...E_n)=P(E_1)P(E_2|E_1)P(E_3|E_1E_2)...P(E_n|E_1...E_{n-1})$. Is it so?
Yes, this is so. Specifically $\mathsf P(A_2,A,A_1)=\mathsf P(A_2\mid A,A_1)\mathsf P(A\mid A_1)\mathsf P(A_1)\\ \mathsf P(A_2,A^\mathsf c,A_1)=\mathsf P(A_2\mid A^\mathsf c,A_1)\mathsf P(A^\mathsf c\mid A_1)\mathsf P(A_1)$
$$\begin{align}\mathsf P(A_2\mid A_1)
~ & = \mathsf P((A\cup A^\mathsf c){\cap} A_2\mid A_1) && \text{Union of Complements}
\\[1ex] & = \mathsf P((A{\cap}A_2)\cup(A^\mathsf c{\cap}A_2)\mid A_1) && \text{Distributive Law}
\\[1ex] & = \mathsf P(A{\cap}A_2\mid A_1) + \mathsf P(A^\mathsf c{\cap}A_2\mid A_1)
&& \text{Additive Rule for Union of Exclusive Events}
\\[1ex] & = \dfrac{\mathsf P(A{\cap}A_1{\cap}A_2)+\mathsf P(A^\mathsf c{\cap}A_1{\cap}A_2)}{\mathsf P(A_1)} && \text{by Definition}
\\[1ex] & = \dfrac{\mathsf P(A_2\mid A{\cap}A_1)\,\mathsf P(A{\cap}A_1)+\mathsf P(A_2\mid A^\mathsf c{\cap}A_1)\,\mathsf P(A^\mathsf c{\cap}A_1)}{\mathsf P(A_1)} && \text{by Definition}
\\[1ex] & = {\mathsf P(A_2\mid A{\cap}A_1)\,\mathsf P(A\mid A_1)+\mathsf P(A_2\mid A^\mathsf c{\cap}A_1)\,\mathsf P(A^\mathsf c\mid A_1)} && \text{by Definition of Conditional Probability}
\end{align}$$
This is exactly the same as the problem you've already solved, but instead of asking "what is the probability he is a high-risk driver given that he had no accidents last year?" we are asking "what is the probability he is a high-risk driver given that he had no accidents in the last four years?" (the year from 3.29 plus three further years of driving).
So instead of $P(\text{no accidents}\mid\text{high-risk})=P(\text{Poi}(1)=0)=e^{-1}$ for a one-year period, now we have $P(\text{no accidents}\mid\text{high-risk})=P(\text{Poi}(4)=0)=e^{-4}$ for a four-year period, and likewise the low-risk drivers have probability $e^{-0.4}$ instead of $e^{-0.1}$.
Best Answer
Keeping your notation, although capital letters for events (like your p) usually makes things a bit clearer, the only term you have left to evaluate is $$ P(A_2 \cap A_1^c ) = p(A_2 \cap A_1^c | p)P(p) + p(A_2 \cap A_1^c | p^c )P(p^c), $$ and these terms should be easy to evaluate $$ P(A_2 \cap A_1^c |p) = P(A_2 | A_1^c,p) P(A_1^c |p )=P(A_2|A_1^c,p)(1-P(A_1|p)) $$ and then finally without any other specified information it would suggest that the probability of an accident in the second fixed period given no accident is just the probability of an accident in a fixed year period $$ P(A_2 |A_1^c,p) = P(A_2 | p) $$