[Math] Finding linearly independent columns of a large sparse rectangular matrix

convex optimizationlinear algebralinear programming

I have a problem that necessitates solving a large non-negative least-squares
problem. My matrix A is large, sparse, highly rectangular (num rows >> num cols)
and nearly binary. However, A is not necessarily of full column-rank, causing my
non-negative least-squares solver (http://www.jasoncantarella.com/webpage/index.php?title=Tsnnls) to fail. Is there an efficient algorithm that will allow me to
select a maximum cardinality set of linearly-independent columns from A so
that I can solve my least-squares problem?

Best Answer

By nearly binary do you mean most entries are $0$ or $1$ but a few are not? I will assume that all entries are $0,1.$ My main comments involve merely noting zero and non-zero.

To be sure of having maximum rank you will need to look at each column. You can start with any independent set (such as the set consisting of a single column) and then examine the rest of the columns in some order. If the next column is independent of the current set adjoin it as a new member otherwise discard it. This will result in a maximum size set (a basis of the column space). Depending on how sparse it is, it could be pretty easy to decide when the next column can be added, at least for a while.

Here are a few simple ideas to boost efficiency:

  • If the matrix is very sparse then use efficient sparse matrix algorithms and representations, these might involve storing it as a set of ordered tuples $(i,j)$ or $(i,j,a_{ij})$ showing the location (and value) of the non-negative entries. There is bound to be highly efficent and well validated existing code which will provide you with a basis for the column space of a (sparse) matrix.

  • Let the depth of a column be the first row with a non-zero entry. Sort the columns according to depth (don't bother to break ties by looking further.) So far this might not require you to look at all the entries, just those down to the first non-zero entry. If the matrix is quite sparse then one might expect many different heights.

  • Pick one column from each depth class, This will give you a starting independent set which might include a relatively large number of columns. Assume that this has been done.

  • Go through the current set of independent columns, look for any rows which are all zero (for the selected column vectors) Now look through the undiscarded columns and try to find one which is non-zero in that position. If one is found then add it to the set. This will enlarge the set of columns.

Related Question