Solved – PyMC3 Implementation of Probabilistic Matrix Factorization (PMF): MAP produces all 0s

bayesianprobabilistic-programmingprobabilitypymcpython

I've started working with pymc3 over the past few days, and after getting a feel for the basics, I've tried implementing the Probabilistic Matrix Factorization model.

For validation, I use a subset of the Jester dataset. I take the first 100 users who rated all 100 jokes. I use the first 20 jokes and leave the ratings unchanged; they are in the range [-10, 10] for all jokes. For ease of reference, I've made this subset available here.

import pymc3 as pm
import numpy as np
import pandas as pd
import theano

data = pd.read_csv('jester-dense-subset-100x20.csv')    
n, m = data.shape
test_size = m / 10
train_size = m - test_size

train = data.copy()
train.ix[:,train_size:] = np.nan  # remove test set data
train[train.isnull()] = train.mean().mean()  # mean value imputation
train = train.values

test = data.copy()
test.ix[:,:train_size] = np.nan  # remove train set data
test = test.values    

# Low precision reflects uncertainty; prevents overfitting
alpha_u = alpha_v = 1/np.var(train)
alpha = np.ones((n,m)) * 2  # fixed precision for likelihood function
dim = 10  # dimensionality

# Specify the model.
with pm.Model() as pmf:
    pmf_U = pm.MvNormal('U', mu=0, tau=alpha_u * np.eye(dim),
                        shape=(n, dim))
    pmf_V = pm.MvNormal('V', mu=0, tau=alpha_v * np.eye(dim),
                        shape=(m, dim))
    pmf_R = pm.Normal('R', mu=theano.tensor.dot(pmf_U, pmf_V.T),
                      tau=alpha, observed=train)

    # Find mode of posterior using optimization
    start = pm.find_MAP()  # Find starting values by optimization

This all appears to work splendidly, but the values produced by find_MAP end up being all 0s for both U and V, as can be seen by running:

(start['U'] == 0).all()
(start['V'] == 0).all()

I am relatively new to both Bayesian modeling and pymc, so I could easily be missing something obvious here. Why am I getting all 0s?

Best Answer

I did two things to fix your code. One was to initialize the model away from zero, the other one was to use a non-gradient based optimizer:

import pymc3 as pm
import numpy as np
import pandas as pd
import theano
import scipy as sp

data = pd.read_csv('jester-dense-subset-100x20.csv')    
n, m = data.shape
test_size = m / 10
train_size = m - test_size

train = data.copy()
train.ix[:,train_size:] = np.nan  # remove test set data
train[train.isnull()] = train.mean().mean()  # mean value imputation
train = train.values

test = data.copy()
test.ix[:,:train_size] = np.nan  # remove train set data
test = test.values    

# Low precision reflects uncertainty; prevents overfitting
alpha_u = alpha_v = 1/np.var(train)
alpha = np.ones((n,m)) * 2  # fixed precision for likelihood function
dim = 10  # dimensionality

# Specify the model.
with pm.Model() as pmf:
    pmf_U = pm.MvNormal('U', mu=0, tau=alpha_u * np.eye(dim),
                        shape=(n, dim), testval=np.random.randn(n, dim)*.01)
    pmf_V = pm.MvNormal('V', mu=0, tau=alpha_v * np.eye(dim),
                        shape=(m, dim), testval=np.random.randn(m, dim)*.01)
    pmf_R = pm.Normal('R', mu=theano.tensor.dot(pmf_U, pmf_V.T),
                      tau=alpha, observed=train)

    # Find mode of posterior using optimization
    start = pm.find_MAP(fmin=sp.optimize.fmin_powell)  # Find starting values by optimization

    step = pm.NUTS(scaling=start)
    trace = pm.sample(500, step, start=start)

This is an interesting model that would make a great contribution. Please consider adding this, once you're certain it works as desired, to the examples folder and do a pull request.

Related Question