MATLAB: Reshape a column vector containing several stocks returns into a matrix format

column to matrixcrspreshapetransformation

Hello everyone,

I have a column vector containing dates and the respective stock prices for a large set of stocks with different starting and ending dates (as some stocks were only listed after the start of the sample or were delisted during the sample period). Hence, the first 100 rows of the vector might containg the dates and stock returns of stock 1, rows 101 to 350 the dates and prices of stock 2, rows 351 to 400 the same data of stock 3 and so on. I obtained the data set from the Wharton CRSP database.

Now, i would like to transform this column vector into a matrix which contains the dates for the largest available time period in the a first column and the respective stock prices in the subsequent columns such that column 2 contains the stock prices of stock 1, column 3 the stock prices of stock 2 and so on.

I already tried several merging or reshaping commands without any success. Therefore, I would be very thankful for any advice.

Kind regards, Maximilian

Best Answer

Here is one way to process your data. You'll have to understand and check that it is correct (at least, check a dozen numbers in the array of prices/returns). I attached the M file, but I copy/pasted the code below as well, to facilitate the discussion. The output of block 3 is the plot below

Note 1 - you may want to update the terminology. What I named "prices" initially is the "returns" that you mention in your last comment.

Note 2 - base functions in MATLAB don't compute with NaNs:

 >> sum(prices(1,:))
 ans =
   NaN

If you have the Financial toolbox, you can use special function for that purpose, e.g.

 >> nansum(prices(1,:))
 ans =
  326.5000

If not, you can extract non-nan elements and aply base operations on these

 >> id = ~isnan( prices(1,:) ) ;
 >> sum( prices(1,id) )
 ans =
  326.5000

----[ CODE ]---------------------------------------------------------------------------------------------------

 %%--- 1. Read data.
 fprintf( 'Reading data..    ' ) ;
 tic ;
 data = xlsread( 'demo_data.xlsx' ) ;
 fprintf( ': done in %.2es\n', toc ) ;
 %%--- 2. Process data.
 fprintf( 'Processing data.. ' ) ;
 tic ;
 % - 2a. Build list of unique PERMCOs. This corresponds to the horizontal
 %  header of the table of prices defined below.

 permcos = unique( data(:,1) ).' ;
 % - 2b. Build list of unique dates. This corresponds the the vertical
 %  header of the table of prices defined below.
 dates   = unique( data(:,2) ) ;
 minDate = min( dates ) ;
 % - 2c. This is the complicated part. Let's start with an example. We have
 % a vector of dates
 %                    1]  19260102
 %                    2]  19260104
 %                    3]  19260105
 %                    4]  19260106
 %                    ..  ...

 %                23283]  20131231
 %



 % where the number behind ] is the index of the element that follows. We
 % will need a table which makes the inverse correspondance, which means
 % making the association 19260102 -> 1, 19260104 -> 2, etc. We could build
 % a giant array where element 19260102 is 1, etc.
 %
 %                    19260102]  1
 %                    19260104]  2
 %                    19260105]  3
 %                    19260106]  4
 %                          ...  ..
 %                    20131231]  23283
 %
 % but it would be quite large in memory (~162MB). Instead, we build a
 % mechanism which makes the correspondance between (date - min(dates) + 1)
 % and indices
 %                    1]  1
 %                    3]  2            % Observe the discont. (*)
 %                    4]  3
 %                    5]  4
 %                    ..  ...
 %               871129]  23283
 %
 % (*) which corresponds to the gap between 19260102 and 19260104. This
 % array takes less than 7MB RAM, which is reasonable. We call that a lookup
 % table; it associates a series of numbers/indices to another series of
 % numbers.
 shiftedDates = dates - minDate + 1 ;
 dates_LT = zeros( max(shiftedDates), 1 ) ;
 dates_LT(shiftedDates) = 1 : numel( shiftedDates ) ;
 % If you display dates_LT(1:10), you'll see that it worked: element 1 is 1,
 % element 3 is 2, etc.
 % - 2c. Build array of prices per date/PERMCO.
 % Prealloc an array of NaNs.
 prices = nan( numel(dates), numel(permcos) ) ;
 % Iterate through PERMCOs and define corresponding columns.
 for pId = 1 : numel(permcos)
    % Identify (build a logical index) rows of the data table associated
    % with current PERMCO. The logical index will then be used to extract
    % relevant dates/prices.
    lId = data(:,1) == permcos(pId) ;
    % Extract relevant dates.
    currentDates = data(lId,2) ;
    % Build the index, in the table of prices, of entries corresponding to
    % these dates: shift them and then use lookup table.
    dId = dates_LT(currentDates - minDate + 1) ;
    % Store current prices at relevant "places" in the array of prices. The
    % column is defined by the PERMCO ID (pId) and rows are defined by date
    % IDs that we just computed.
    prices(dId,pId) = data(lId,3) ;
 end
 fprintf( ': done in %.2es\n', toc ) ;
 %%--- 3. Plot time series.
 % - 3a. Plot, grid, ylabel.
 plot( dates, nansum(prices, 2), 'Color', 0.85*[1,1,1], 'LineWidth', 2 ) ;
 hold on ;  grid on ;
 plot( dates, prices ) ;
 ylabel( 'Return [$]' ) ;
 % - 3b. Legend. 
 legenItems = arrayfun( @(pId) sprintf( 'PERMCO %d', pId), permcos, ...
    'UniformOutput', false ) ;
 legend( [{'Sum portfolio (?)'}, legenItems{:}], 2 ) ;

Related Solutions

MATLAB: How to reshape efficiently Panel Data

You can replace the loop with the following:

Matrix_Prices = zeros(numel(historical_dates), numel(tickers));
Matrix_Returns = zeros(numel(historical_dates), numel(tickers));
[~, rows] = ismember(lower(historical_returns(:, 1)), lower(historical_dates)); %use lower case to make case insensitive comparison
[~, cols] = ismember(lower(historical_returns(:, 2)), lower(tickers)); 
indices = sub2ind(size(Matrix_Prices), rows, cols);
Matrix_Prices(indices) = cell2mat(historical_returns(:, 3));
Matrix_Returns(indices) = cell2mat(historical_returns(:, 4));

MATLAB: Re-arrange CRSP Data

I would elect to do this using a table. However, column headers in a table are variable names, and MATLAB variables cannot be numbers. That means a little modification will be needed. You can set the codes as the rownames and the modified dates as the variableNames (column headers).

This is a bit of a hack, and can likely be improved. Still, this should get you started

file = 'US_500_Book_SRT.xlsx';
opts = detectImportOptions(file);
% Your dates are in a numeric 'yyyymmdd' format that can't be recognized by readtable.
% I import them as numbers here and convert them later.
opts = setvartype(opts,1,'categorical');
% load data
data = readtable(file,opts);
% convert dates from numeric 'yyyymmdd' to datetime 'yyyyMM'
datetimeYY = @(D)datetime(D,'ConvertFrom','yyyymmdd','Format','yyyyMM');
data = convertvars(data,2,datetimeYY);
% I want to use dates as a grouping mechanism, so convert to categorical as well.
data = convertvars(data,2,'categorical')
% extract the unique stock codes and yyyyMM dates
stocks = categories(data.Var1);
dates = categories(data.Var2);
% Find groups, using indeces to assign to row, column of matrix
Gs = findgroups(data.Var1);
Gd = findgroups(data.Var2);
mat2 = zeros(length(stocks),length(dates));
% Didn't find a more elegant way to do this. Loop through each row of data
% Assign to table using stocks group for the row number and dates group for the column
for loop = 1:length(Gs)
    mat2(Gs(loop),Gd(loop)) = data.Var3(loop);
end
% Convert mat2 to a table
tbl2 = table(mat2);
tbl2 = splitvars(tbl2);
% assign stock codes as rownames
tbl2.Properties.RowNames = stocks;
% make column names dates (prepend with 'd_' so it is a valid MATLAB variable name)
tbl2.Properties.VariableNames = 'd_' + string(dates);

Best Answer

Related Solutions

MATLAB: How to reshape efficiently Panel Data

MATLAB: Re-arrange CRSP Data

Related Question