chol factoring for sparse matrices

Hello everybody

Can anyone advise me how to make octave's "chol" function work according to common sense?

I have the sparse matrix A 3565x119 with the density 0.016955 (just an example), and I need to factor A'*A, which is 119x119 100% dense.
Nevertheless, "chol" factored it very nicely in 0.00199413 seconds.

However, the full-dense character of A'*A is caused just by the single dense row. So I deleted it and obtained the sparse matrix AS
3564x119 with the density of 0.016679 (slightly less) and the product AS'*AS as 119*119 sparse matrix with 7139 nonzeros only (density
0.50413). Still rather dense but nevertheless about twice less memory consumption. As I am looking forward to matrices with 10^6 rows, it
is essential.

However, in this case, "chol" factored the sparse product in 0.350877 seconds! About 180 times slower !!! OK, when you do it like

chol(full(AS'*AS)),

you obtain the expected 0.000941992 seconds, but I suspect that octave
will not like extra memory consumption.

So my questions are: is there somewhere in the dark octave corner the sparse Cholesky factorization? I do not particularly like writing
anything like

sparse(chol(full(AS'*AS)));

This finally saves me about 60% of memory but will not probably work for huge matrices. Or is there any trick to avoid explicit conversion
to a full matrix for "chol" without sacrifying too much of runtime?

Octave uses the CHOLMOD library which is part of SuiteSparse to calculate the Cholesky factorization of sparse matrices.
Which platform are you using?

I couldn’t reproduce this with a random sparse matrix with Octave 6.3 on Windows 10:

>> A = sprand (3565, 119, 0.017);
>> B = A'*A;
>> nnz(B)/numel(B)
ans = 0.6547
>> tic, R = chol(B); toc
Elapsed time is 0.000714064 seconds.
>> tic, R = chol(B); toc
Elapsed time is 0.000676155 seconds.
>> tic, R = chol(full(B)); toc
Elapsed time is 0.000345945 seconds.
>> tic, R = chol(full(B)); toc
Elapsed time is 0.000313997 seconds.

The sparse matrix might have been slower. But not by the orders of magnitude you are seeing.

A matrix with density above 0.5 not sparse, it just has many zero entries. Trying to store such a matrix as sparse will be disappointing:

For example a 10x10 dense matrix

  • 10 * 10 * 8 (double) = 800 Byte

A 10x10 sparse matrix (density 0.5 = 50 entries)

  • 50 * 8 (data, double) + 50 * 8 (ridx, int64_t) + 11 * 8 (cidx, int64_t) = 888 Byte
A = rand (10);
B = sprand (10, 10, 0.5);
whos A B

In Octave sparse matrices are stored in the compressed sparse column (CSC) format. As the density of the matrix grows above 0.3, the “book keeping” of indices within the specialized algorithms dominates the calculation to a point where the implicitly given indices of a dense matrix can be better exploited by dense algorithms.

As @mmuetzel said, Octave uses SuiteSparse internally, which is a very matured software and really fast for sparse matrices (density < 0.4).

@mmuetzel I obtain similar figures.

Sparse matrix algorithms do not magically solve all performance issues. If it was easy to just remove a row in you problem, you might further study the structure of your problem to see where sparsity can be exploited. However in the given problem dense computations and matrix factorizations up to dimensions of 10^4 < 1 GB are no big deal for Octave on common laptops these days. Depending on the needs, 10^5 < 75 GB might require the final computation to be performed on a more capable computing system.