Statistical Analysis – Fisher Information Not Positive Definite for Truncated Normal with Heteroskedasticity

This question is about having a non-positive-definite expected Fisher information in a normal model in which observations have different dispersions.

Consider this simple normal model:

$$Y_i \sim N(\mu, \tau^2 + \sigma^2_i)$$

where $\sigma^2_i$ is the known (not estimated) error variance for the $i^{th}$ observation. Such a model can arise in meta-analysis as the marginal distribution of a hierarchical structure; there, the $Y_i$ are studies' point estimates.

Now, I'm interested specifically in modeling only observations with $Y_i/\sigma_i < 1.96$. Call these truncated observations $Y_i^*$. That is:

$$Y_i^* \sim N_{<1.96*\sigma_i}(\mu, \tau^2 + \sigma^2_i)$$

where the subscript means this is a right-truncated normal (RTN) with an upper truncation threshold of 1.96 * $\sigma_i$. The joint likelihood is:

$$\mathcal{l} = \sum_i -\log \left( \sqrt{\sigma^2_i + \tau^2} \sqrt{2 \pi} \right) – 0.5 \left(\sigma^2_i + \tau^2\right)^{-1} (Y_i^* – \mu)^2 – \log \Phi(\tilde{c}_i)$$

where $\tilde{c}_i = \frac{1.96 * \sigma_i – \mu}{\sqrt{\sigma^2_i + \tau^2}}$ (i.e., the marginally standardized truncation threshold for the $i^{th}$ observation) and $r = \frac{ \phi\left(\tilde{c}_i\right) }{ \Phi\left(\tilde{c}_i\right)}$ (i.e., the inverse Mills ratio).

Parameterizing the joint likelihood in terms of $(\mu, \tau)$, I calculated the expected Fisher information for the sample as a sum of each observation's Fisher information (since they're not iid given the different $\sigma_i$). However, it turns out that the expected Fisher information is not positive definite for all values of $(\mu, \tau)$ and vectors of $\sigma_i$. I know that the observed Fisher information can, in general, be non-positive-definite, but the expected Fisher information?

To avoid mistakes when taking all those derivatives, I got them analytically from R and then replaced the terms $Y_i^* – \mu$ and $(Y_i^* – \mu)^2$ with their expectations:

$$E[Y_i^* – \mu] = -r \sqrt{\sigma^2_i + \tau^2}$$
$$E[(Y_i^* – \mu)^2] = \left( \sigma^2_i + \tau^2 \right) \left( 1 – \tilde{c}_i r \right)$$

My questions: How is it possible that the expected Fisher information isn't positive definite for this model? If this is not the correct way to express the Fisher information for this model, what is?

Interesting and suggestive tidbits

The expected Fisher information for an untruncated version of the same model does not exhibit this problem; it seems to always be positive definite.
If one takes derivatives of the truncated likelihood with respect to the marginal standard deviation ($\sqrt{\sigma^2_i + \tau^2}$) rather than $\tau$, then the expected Fisher information once again is positive definite (c.f. equivalent question). But when the $\sigma_i$ differ across observations, of course that approach won't work for estimating $\tau$.

MWE in R

Here's an MWE in which I calculated the derivatives of the Fisher information numerically to avoid math mistakes and then modified those analytical expressions to turn them into expectations. I also checked the derivatives against Mathematica, so I'm quite confident they are correct. Then I demonstrate a simple case in which the Fisher information isn't positive definite.

library(truncnorm)
library(Deriv)
library(testthat)

# likelihood of right-truncated normal
# .yi ~ RTN_{> .crit*sei}(.mu, .tau^2 + .sei^2)
simple_lli = function(.yi,
                      .sei,
                      .mu,
                      .tau,  
                      .crit ) {
  
  Si = sqrt(.tau^2 + .sei^2)
  
  # standardized truncation threshold
  alphaU = (.crit*.sei - .mu)/Si
  
  -log( Si* sqrt(2*pi) ) - 0.5 * Si^(-2) * (.yi-.mu)^2 - log( pnorm(alphaU) )
}

# # show it agrees with dtruncnorm:
# mine = simple_lli(.y = c(0.2, -1),
#                   .sei = c(1, 0.5),
#                   .mu = 0.3,
#                   .tau = 0.1,
#                   .crit = 1.96)
# 
# expect_equal( log( dtruncnorm(x = c(0.2, -1),
#                               a = -Inf,
#                               b = 1.96*c(1, 0.5),
#                               mean = 0.3,
#                               sd = sqrt( 0.1^2 + c(1, 0.5)^2 ) ) ),
#               mine )


# take partial derivatives for Fisher info
# dl / dmu
get_D1_num = Deriv(simple_lli, ".mu")

# d^2 l / d mu^2
get_D11_num = Deriv(get_D1_num, ".mu")


# dl / d mu d tau
get_D12_num = Deriv(get_D1_num, ".tau")

# dl / d tau
get_D2_num = Deriv(simple_lli, ".tau")

# d^2l / d tau^2
get_D22_num = Deriv(get_D2_num, ".tau") 

# the body of each of these functions shows the analytical form of the derivative
get_D12_num

# Get expected Fisher information by replacing (.yi-.mu) and (.yi-mu)^2 terms 
# in the derivatives with their expectations.
# The expressions from Deriv() are otherwise not simplified at all to avoid 
introducing mistakes.

# The kmm, kms, and kss terms below are the expectations of the partial derivatives.
fisher_1 = function(mu, tau, k, sei, tcrit) {
  
  # this will be the TOTALS for all observations
  fishinfototal = matrix( 0, nrow = 2, ncol = 2 )
  
  # build a Fisher info matrix for each observation
  for (i in 1:k) {
    
    # for this observation
    fishinfo = matrix( NA, nrow = 2, ncol = 2 )
    
    # from body of R's get_D11_num above:
    e2 = sei[i]^2 + tau^2
    e3 = sqrt(e2)
    e5 = sei[i] * tcrit[i] - mu
    e6 = e5/e3
    e7 = dnorm(e6, 0, 1)
    e8 = pnorm(e6)
    kmm = -(1/e2 - (e5/(e2 * e8) + e7 * e3/(e8 * e3)^2) * e7/e3)
    
    # from body of R's get_D12_num:
    e2 = sei[i]^2 + tau^2
    e3 = sqrt(e2)
    e5 = sei[i] * tcrit[i] - mu
    # e6 is scaled critical value:
    e6 = e5/e3
    e7 = pnorm(e6)
    e8 = e2^2
    e9 = dnorm(e6, 0, 1)
    
    # replace .yi - .mu in derivative with its expectation:
    expectation1 = -sqrt(sei[i]^2 + tau^2) * e9/e7
    kms = -(tau * (((e7/e3 - e5 * e9/e2)/(e7 * e3)^2 - e5^2/(e8 *
                                                               e7 * e3)) * e9 + 2 * ((expectation1)/e8)))
    
    
    # from body of R's get_D22_num:
    e1 = tau^2
    e3 = sei[i]^2 + e1
    e5 = sei[i] * tcrit[i] - mu
    e6 = sqrt(e3)
    # e7 is scaled crit value:
    e7 = e5/e6
    e8 = pnorm(e7)
    e9 = dnorm(e7, 0, 1)
    e10 = e5 * e9
    e11 = e8 * e6
    e13 = e10/e11
    # replace this one with its expectation:
    # e15 = (.yi - .mu)^2/e3
    # expectation of (.yi - .mu)^2:
    expectation2 = (sei[i]^2 + tau^2)*(1 - e7 * e9/e8)
    e15 = expectation2^2/e3
    
    kss = (e13 + e15 - (e1 * (e5 * ((e8/e6 - e10/e3)/e11^2 -
                                      e5^2/(e3^2 * e8 * e6)) * e9 + 2 * ((e13 + 2 * e15 -
                                                                            1)/e3)) + 1))/e3
    
    
    fishinfo[1,1] = -kmm
    fishinfo[1,2] = -kms
    fishinfo[2,1] = -kms
    fishinfo[2,2] = -kss
    
    # add the new fisher info to the total one
    fishinfototal = fishinfototal + fishinfo
  }
  
  return(fishinfototal)
}

# a case where it's not positive definite
fish = fisher_1(mu = 0.5, 
                tau = 0.3,
                k = 1,
                sei = 0.3,
                tcrit = 1.96)
eigen(fish)

$values
[1]  4.844817 -1.504037

$vectors
[,1]       [,2]
[1,] -0.7723908 -0.6351476
[2,]  0.6351476 -0.7723908

Best Answer

The expected Fisher information is positive-definite by definition, so there must be some mistake in your code.

Here is exactly the same calculation using MATLAB's symbolic math toolbox, which produces a positive-definite matrix with the same inputs:

(it's enough to check the case $k=1$, since a sum of positive definite matrices is also positive definite)

syms mu tau sigma_i Y
z = 1.96 ;
c_i = (z*sigma_i - mu)/sqrt(sigma_i^2 + tau^2) ;
r = normpdf(c_i) / normcdf(c_i) ;

%the likelihood
logl  = -log(sigma_i^2 + tau^2)/2  - (Y - mu)^2/(sigma_i^2 + tau^2)/2 - log( normcdf( c_i ) );

%Fisher matrix from second derivatives
I = -[diff(logl,'mu',2) , diff(logl,'mu','tau') ; 
      diff(logl,'mu','tau') , diff(logl,'tau',2)] ;

%subsitute Y and (Y-mu)^2 with their expectations
I(2,2) = subs( I(2,2), (Y-mu)^2 , (tau^2 + sigma_i^2)*(1 - c_i*r) ) ;  %I(2,2) has only quadratic term
I(1,2) = subs( I(1,2) , Y , mu - r*sqrt(tau^2  +sigma_i^2) ) ;         %I(1,2) has only linear term
I(2,1) = I(1,2) ;

%evaluate at some values
>> eval( subs(I , {'mu','tau','sigma_i'} , [0.5,0.3,0.3] ))

ans =

    2.2836   -3.1146
   -3.1146    5.0987

>> eig(ans)

ans =

    0.2733
    7.1091

Interesting and suggestive tidbits

MWE in R

Best Answer

Related Solutions

Solved – the Fisher information for the truncated poisson distribution

Gamma Distribution – Understanding Fisher Information and Expected Information for Gamma Distribution

Related Question