In a class (I'm the teacher), we are crossing Drosophila with different traits to see if they inherit some characteristics on autosomes or sex chromosomes. In order to do that, we do reciprocal crosses.
When we cross a male with normal wings (NN) x female vestigial wings (nn), all the descendants should have normal wings (Nn) if the gene is located on autosome (in the fly offspring, there would be 50% males and 50% females, and 100% of males would have normal wings and 100% of females would have also normal wings).
Now if we take the 2 descendants (one male and one female) and cross it, we should have 50% males and 50% females, but of these proportions, we should get $3/4 * 1/2 = 3/8 = 0.375 $ or 37.5% males that have normal wings and $1/4 * 1/2 = 1/8 = 0.125 $ or 12.5% males that have vestigial wings (same logic applies for females)
An alternative would be that the gene for wing type is located on the X (sex chromosome) and therefore male with normal wings ($X^NY$) x female vestigial wings ($X^nX^n$) would produce 50% males and 50% females, but 100% of males would have normal wings and 100% of females would have normal wings. Now if we take the 2 descendants (one male ($X^nY$) and one female ($X^nX^n$)) and cross it, we should have 50% males and 50% females, but 1/2 males would have normal wings and 1/2 would have vestigial wings. The same logic applies for females.
We can do the following reasoning, but with the eye as another trait we want to investigate.
Here is a visual summary of what is explained (1 type of cross out of 4) :
Now we tested that experimentally by crossing parents and getting fly offsprings. We bred flies and got the following results :
structure(list(cross = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L,
3L, 3L, 3L, 4L, 4L, 4L, 4L), cross.name = c("male normal X females vestigial",
"male normal X females vestigial", "male normal X females vestigial",
"male normal X females vestigial", "male vestigial X females normal",
"male vestigial X females normal", "male vestigial X females normal",
"male vestigial X females normal", "male red X female white",
"male red X female white", "male red X female white", "male red X female white",
"male white X female red", "male white X female red", "male white X female red",
"male white X female red"), sex = c("male", "male", "female",
"female", "male", "male", "female", "female", "male", "male",
"female", "female", "male", "male", "female", "female"), trait = c("wing",
"wing", "wing", "wing", "wing", "wing", "wing", "wing", "eye",
"eye", "eye", "eye", "eye", "eye", "eye", "eye"), phenotype = c("normal",
"vestigial", "normal", "vestigial", "normal", "vestigial", "normal",
"vestigial", "red", "white", "red", "white", "red", "white",
"red", "white"), nb.f1 = c(98L, 1L, 70L, 0L, 28L, 0L, 22L, 0L,
2L, 92L, 109L, 4L, 53L, 0L, 71L, 0L), nb.f2 = c(120L, 43L, 134L,
50L, 37L, 22L, 47L, 14L, 93L, 82L, 90L, 84L, 72L, 73L, 167L,
0L), theoretial.f1.autosome = c(50L, 0L, 50L, 0L, 50L, 0L, 50L,
0L, 50L, 0L, 50L, 0L, 50L, 0L, 50L, 0L), theoretial.f1.sex.chromosome = c(0L,
50L, 50L, 0L, 50L, 0L, 50L, 0L, 0L, 50L, 50L, 0L, 50L, 0L, 50L,
0L), theoretial.f2.autosome = c(37.5, 12.5, 37.5, 12.5, 37.5,
12.5, 37.5, 12.5, 37.5, 12.5, 37.5, 12.5, 37.5, 12.5, 37.5, 12.5
), theoretial.f2.sex.chromosome = c(25L, 25L, 25L, 25L, 25L,
25L, 50L, 0L, 25L, 25L, 25L, 25L, 25L, 25L, 50L, 0L)), class = "data.frame", row.names = c(NA,
-16L))
I've been thinking using a chi square to test the association in the data, but when I get to a percentage theoretical value of (expected value) that equals 0, the the chi square doesn't return a value.
What could be used to test which hypothesis is true (if each trait, wing or eye) is on the autosome or sex chromosome?
Based on one answer to this question here is the problem that I face when calculating the chi square:
df= df %>%
group_by(cross) %>%
mutate(sum.per.cross.f1 = sum(nb.f1),
sum.per.cross.f2 = sum(nb.f2)) %>%
ungroup()
df.q.chi2= df %>%
mutate(exp.nb.f1.auto = sum.per.cross.f1*theoretial.f1.autosome/100,
exp.nb.f1.sex = sum.per.cross.f1*theoretial.f1.sex.chromosome/100,
exp.nb.f2.auto = sum.per.cross.f2*theoretial.f2.autosome/100,
exp.nb.f2.sex = sum.per.cross.f2*theoretial.f2.sex.chromosome/100,
q.1.auto = (nb.f1-exp.nb.f1.auto)^2/exp.nb.f1.auto,
q.1.sex = (nb.f1-exp.nb.f1.sex)^2/exp.nb.f1.sex,
q.2.auto = (nb.f2-exp.nb.f2.auto)^2/exp.nb.f2.auto,
q.2.sex = (nb.f2-exp.nb.f2.sex)^2/exp.nb.f2.sex) %>%
group_by(cross) %>%
select(q.1.auto,
q.1.sex,
q.2.auto,
q.2.sex)
Here is the output :
# A tibble: 16 × 5
# Groups: cross [4]
cross q.1.auto q.1.sex q.2.auto q.2.sex
<int> <dbl> <dbl> <dbl> <dbl>
1 1 2.16 Inf 0.788 12.7
2 1 Inf 82.5 0.00324 22.1
3 1 2.49 2.49 0.115 25.7
4 1 NaN NaN 1.01 15.6
5 2 0.36 0.36 1.42 1.63
6 2 NaN NaN 3.27 2.13
7 2 0.36 0.36 0.0889 2.82
8 2 NaN NaN 0.0667 Inf
9 3 99.5 Inf 11.0 0.379
10 3 Inf 1.28 33.8 0.316
11 3 0.292 0.292 12.8 0.0867
12 3 Inf Inf 37.4 0.121
13 4 1.31 1.31 17.3 0.462
14 4 NaN NaN 29.6 0.321
15 4 1.31 1.31 21.4 0.776
16 4 NaN NaN 39 NaN
You can see now that if I 'sum' the 'q' columns to get the 'q' values that I could put in R and find a p-value, that a lot are NaN or Inf… That is my question, how to deal with this (if using the chi square)? Would there be another test that would allow me do this?
If I continue and calculate the p-values, I can make no statistical call on wether one scenario is better explaining the data :
df.q.chi2
df.q.chi2.l = pivot_longer(df.q.chi2,cols = !cross)
df.q.chi2.l.no.na = na.omit(df.q.chi2.l)
df.q.all = df.q.chi2.l.no.na[is.finite(df.q.chi2.l.no.na$value),]
df.q.all %>%
group_by(cross,name) %>%
summarise(sum.cross = sum(value, na.rm = TRUE),
nb = n(),
pv = 1 - pchisq(sum.cross, nb-1),
sign = ifelse(pv<=0.05,"sg","ns")) %>%
mutate(f = substring(name,3,3)) %>%
filter(f ==2)
Below would be the table of all the outcomes possible (all are significant, so I would not be able to discriminate if the gene are found on an autosome or a sex chromosome : but when I directly look at the data, it seems possible to distinguish between the 2).
`summarise()` has grouped output by 'cross'. You can override using the `.groups` argument.
# A tibble: 8 × 7
# Groups: cross [4]
cross name sum.cross nb pv sign f
<int> <chr> <dbl> <int> <dbl> <chr> <chr>
1 1 q.2.auto 1.92 4 5.90e- 1 ns 2
2 1 q.2.sex 76.1 4 2.22e-16 sg 2
3 2 q.2.auto 4.84 4 1.84e- 1 ns 2
4 2 q.2.sex 6.58 3 3.72e- 2 sg 2
5 3 q.2.auto 94.9 4 0 sg 2
6 3 q.2.sex 0.903 4 8.25e- 1 ns 2
7 4 q.2.auto 107. 4 0 sg 2
8 4 q.2.sex 1.56 3 4.59e- 1 ns 2
For completion here is the cells where I have an expected number of individuals that is 0.
df %>%
mutate(exp.nb.f1.auto = sum.per.cross.f1*theoretial.f1.autosome/100,
exp.nb.f1.sex = sum.per.cross.f1*theoretial.f1.sex.chromosome/100,
exp.nb.f2.auto = sum.per.cross.f2*theoretial.f2.autosome/100,
exp.nb.f2.sex = sum.per.cross.f2*theoretial.f2.sex.chromosome/100,
q.1.auto = (nb.f1-exp.nb.f1.auto)^2/exp.nb.f1.auto,
q.1.sex = (nb.f1-exp.nb.f1.sex)^2/exp.nb.f1.sex,
q.2.auto = (nb.f2-exp.nb.f2.auto)^2/exp.nb.f2.auto,
q.2.sex = (nb.f2-exp.nb.f2.sex)^2/exp.nb.f2.sex) %>%
select(cross.name,cross, sex, phenotype,nb.f1,nb.f2,exp.nb.f1.auto,
exp.nb.f1.sex ,
exp.nb.f2.auto,
exp.nb.f2.sex )
See the line 8 and 16 for example. Taking line 16 as an example, this is 0 simply because when crossing F1 of the eye (if the gene is on a sex chromosome [R for red and r for white], the are no female that should have white eye). The reason is that when crossing the original parents, ($X^rY$ and $X^RX^R$) giving offsprings $X^RX^r$, $X^RY$and breeding only these offsprings together, we get $X^RY$, $X^rY$ for males and $X^RX^R$, $X^RX^r$ for females, so there can only be females with red eyes, no female with white eye.
# A tibble: 16 × 9
cross sex phenotype nb.f1 nb.f2 exp.nb.f1.auto exp.nb.f1.sex exp.nb.f2.auto exp.nb.f2.sex
<int> <chr> <chr> <int> <int> <dbl> <dbl> <dbl> <dbl>
1 1 male normal 98 120 84.5 0 130. 86.8
2 1 male vestigial 1 43 0 84.5 43.4 86.8
3 1 female normal 70 134 84.5 84.5 130. 86.8
4 1 female vestigial 0 50 0 0 43.4 86.8
5 2 male normal 28 37 25 25 45 30
6 2 male vestigial 0 22 0 0 15 30
7 2 female normal 22 47 25 25 45 60
8 2 female vestigial 0 14 0 0 15 0
9 3 male red 2 93 104. 0 131. 87.2
10 3 male white 92 82 0 104. 43.6 87.2
11 3 female red 109 90 104. 104. 131. 87.2
12 3 female white 4 84 0 0 43.6 87.2
13 4 male red 53 72 62 62 117 78
14 4 male white 0 73 0 0 39 78
15 4 female red 71 167 62 62 117 156
16 4 female white 0 0 0 0 39 0
Best Answer
Consider a die that has equal probabilities for its six faces. However, the faces are labeled 1, 1, 2, 3, 4, 5. So you have five possible outcomes with respective probabilities $p = (1/3, 1/6, 1/6, 1/6, 1/6).$ Your table will have 'categories' 1, 2, 3, 4, 5, You will ignore the category 6 that would have been possible with a standard die.
Example in R:
The chi-squared test has P-value $0.57 > 0.05 = 5\%,$ so the null hypothesis that categories have the probabilities $p$ is not rejected.
Similarly, in your study, just suppress the impossible categories. Degrees of freedom for the chi-squared statistic will be the number of remaining categories, minus one.