Solved – Generating functions for Poisson regression using MATLAB

MATLABmultivariate analysisnonlinear regressionpoisson distributionself-study

Alright, I'm having an insane amount of difficulty for what seems like a simple concept. I need to generate a bunch of basis functions for a curve that underlies some simulated Poisson distributed data and then run Newton's method on it to fit the log-likelihood (Poisson regression). I understand the math behind Newton's method enough to code that up and fit the data. However, I'm having a lot of difficulty just trying to generate a set of basis functions for which the equation $-\mathbf{X}^{\mathrm{T}}\cdot\mathrm{diag}\left(\exp\left(\mathbf{X}\cdot\boldsymbol\theta\right)\right)\cdot\mathbf{X}$ results in a tridiagonal matrix, where $$\mathbf{X}\cdot\boldsymbol\theta= \begin{bmatrix}
f_{1}\left(\mathbf{t}_1\right) & f_{2}\left(\mathbf{t}_1\right) & \cdots & f_{d}\left(\mathbf{t}_1\right)\\
f_{1}\left(\mathbf{t}_2\right) & f_{2}\left(\mathbf{t}_2\right) & \cdots & f_{d}\left(\mathbf{t}_2\right) \\
\vdots & \vdots & \ddots & \vdots\\
f_{1}\left(\mathbf{t}_n\right) & f_{2}\left(\mathbf{t}_n\right) & \cdots & f_{d}\left(\mathbf{t}_n\right)
\end{bmatrix}
\begin{bmatrix}
\theta_1\\
\theta_2\\
\vdots\\
\theta_d
\end{bmatrix},$$ and $\mathrm{diag}$ results in a matrix with the elements of its argument on the diagonal of an otherwise zero-filled matrix.

Here's what I've got so far:

function ddl = fastnewton(d, varargin)
    %% boilerplate for optional function arguments
    p = inputParser;

    % required arguments
    p.addRequired('d', @isposintscalar)

    % optional arguments
    p.addOptional('n', 1000, @isposintscalar)

    % keyword arguments
    p.addParamValue('order', 3, @isposintscalar)
    p.addParamValue('tminmax', [-10, 10], ...
            @(x) (isvector(x) || iscell(x)) && length(x) == 2)
    p.addParamValue('weightfunc', @(varargin) abs(randn(varargin{:})), ...
            @(x) isa(x, function_handle))

    % parse the args and get the results
    p.parse(d, varargin{:})

    r = p.Results;
    d = r.d;
    n = r.n;
    order = r.order;

    if iscell(r.tminmax)
        r.tminmax = cell2mat(r.tminmax);
    end

    tminmax = r.tminmax;
    weightfunc = r.weightfunc;

    %% make a time vector between tmax and tmin, inclusive
    % tminmax == [-10, 10]
    tmin = tminmax(1);
    tmax = tminmax(2);
    t = linspace(tmin, tmax, n); % n == 1000

    %% make some random weights using the 'weightfunc' function
    % weightfunc == @(varargin) abs(randn(varargin{:}))
    theta = weightfunc(1, d);

    %% make the bump funcs
    breaks = linspace(tmin, tmax, d + order);
    coeffs = theta;
    pp = spmak(breaks, diag(coeffs));

    %% compute the value of the function to be estimated
    Fbig = spval(pp, t);
    F = sum(Fbig);

    %% initialize the sample vector
    y = poissrnd(exp(F));

    subplot(211)
    hold on
    plot(t, y, 'LineWidth', 1)
    plot(t, Fbig, 'LineWidth', 2)
    plot(t, F, 'r', 'LineWidth', 4)
    hold off
    legend({'y', '', 'sum(F)'})

    %% plotting
    lw = 3;
    opts = {'LineWidth', lw};
    subplot(212)
    hold on
    plot(t, y, 'b', opts{:})
    plot(t, F, 'g', opts{:})
    hold off
    axis tight

    X = Fbig';
    ddl = sparse(-X' * diag(exp(F)) * X);

    figure
    hold on
    plot(t, y, 'b', 'LineWidth', 1)
    plot(t, F, 'r', 'LineWidth', 2);
    hold off

    figure
    spy(ddl)
end

You can run this code as follows: fastnewton(30, 'order', 2). That will give you 30 basis functions of order 2 which will give you a tridiagonal matrix. However, if you run fastnewton(30, 'order', 3) you get nice, smooth basis functions but a matrix with a bandwith of 3 instead, which is not tridiagonal. MATLAB's cute little spy function gives you a graphical representation of the sparsity of a matrix.

All I really need is some way to adjust the basis functions so that they are smooth AND the equation above results in a tridiagonal matrix, then I can do the rest myself. Thanks!

Best Answer

Not quite what you're asking for, but Statistics Toolbox has a handly little function called glmfit

I'm attaching some code that I've used to demo Poisson regression in the past...

%%  Generalized Linear Model Demo

% Generalized Linear Models encompasses techniques like Poisson 
% regression and Logistic regression. This technique is used to model
% counts and estimate odds.

% Formally, the technique was developed to model data where the error terms
% don't exhibit constant variance.

% Specify the average relationship between X and Y

clear all
clc

% Create a set of 100 X values between 0 and 15

X = linspace(0,15,100);
X = X';

% Define a function that describes the average relationship between X and
% Y.  

mu = @(x) exp(-1.5 +.25*x);
Y = mu(X);

plot(X,Y); 
xlabel('Time');
ylabel('Count');

%%  Use the Y = f(X) to generate a dataset with known characteristics

%  At each "X", the observed Y is represented as a draw from a Poisson
%  distribution with mean (lambda) = Y

Noisy_Y = poissrnd(Y,100,1);

% Create figure
figure1 = figure;

% Create axes
axes1 = axes('Parent',figure1,'YTick',[0:max(Noisy_Y)], 'YGrid','on');
hold(axes1,'all');

% Create scatter
scatter(X,Noisy_Y, 'b');

xlabel('Time')
ylabel('Count')

% Note the following:

% The mean of a poisson process is also the variance of a poisson process.
% The blue curve doesn't exhibit constant variance (this violates one of
% the key assumptions underlying Ordinary Least Squares)

% The output from a Poisson process is always a non-negative integer.  If
% lambda is close to zero AND observations can never going
% non-negative, the shape of the distribution is bounded in one direction.
% As lambda gets larger, this boundary has less and less impact on the
% distribution of the errors.

hold on
plot(X,Y); 

%%  Create a fit using nonlinear regression

% The model used for the nonlinear regression is the same as "mu"

foo = fit(X, Noisy_Y, 'exp(B1 + B2*x)');
plot(foo)

xlabel('Time')
ylabel('Count')

SSE_OLS = sum((foo(X) - Noisy_Y).^2)

% Create textbox
annotation(figure1,'textbox',...
    [0.147443514644351 0.802739726027397 0.255903765690377 0.0931506849315069],...
    'String',{['SSE OLS = ' num2str(SSE_OLS)]},...
    'FitBoxToText','off',...
    'BackgroundColor',[1 1 1]);


%% Create a fit using a Generalized Linear Model

[b dev stats] = glmfit(X,Noisy_Y, 'poisson');

Predicted_Y = exp(b(1) + b(2)*X);
plot(X, Predicted_Y, 'k');

SSE_GLM = sum(stats.resid.^2)

% Create textbox
annotation(figure1,'textbox',...
    [0.147443514644351 0.802739726027397 0.255903765690377 0.0931506849315069],...
    'String',{['SSE OLS = ' num2str(SSE_OLS)], ['SSE GLM = ' num2str(SSE_GLM)]},...
    'FitBoxToText','off',...
    'BackgroundColor',[1 1 1]);

%  The two curves (should) look almost the same
%  The SSE is (essentially) the same.
% What gives?  Why are we bothering with this GLM stuff?

%% Perform a Monte Carlo simulation

% Generate 1000 data sets that are consistent with with the assumed
% relationship between Count and Time

parfor i = 1:1000

     Noisy_Y = poissrnd(Y, 100,1);

     b = glmfit(X,Noisy_Y, 'poisson');

     GLM_Prediction(:,i) = exp(b(1) + b(2)*X); 

     % Provide optimal starting conditions (make sure that ...
     % any differences are related to the algorithms rather than
     % convergence

     foo = fit(X, Noisy_Y, 'exp(B1 + B2*x)', 'Startpoint', [-1.5, .25]);

     B = coeffvalues(foo);

     OLS_Prediction(:,i) = exp(B(1) + B(2)*X);

end

figure

True = plot(X, Y, 'b');
hold on
GLM_Mean = plot(X, mean(GLM_Prediction, 2),'k')
OLS_Mean = plot(X, mean(OLS_Prediction, 2),'r')
xlabel('Time')
ylabel('Count')

% The means of the two two techniques are, for all intents and purposes,
% identical.

%%

% The standard deviation from the GLM is MUCH smaller.  Also note that 
% I provided the nonlinear regression with optimal starting conditions.

% Formally, the advantage to using a GLM has to do with the scope of the
% technique (The range of conditions over which the GLM will generate a
% good fit).  GLMs have a much broader scope.  Conversely, using a
% nonlinear regression to fit the inverse of the link function can (badly)
% misfire on occasion.

% glmfit is more robust

GLM_std = std(GLM_Prediction, [], 2);
OLS_std = std(OLS_Prediction, [], 2);

figure
plot(X, OLS_std./GLM_std)
xlabel('Time')
ylabel('Ratio of OLS std:GLM std')
Related Question