Matrice grosse piece puzzle

Comment

Author: Admin | 2025-04-28

I am trying to make an existing piece of software that uses hand tuned sparse multiplication of special CSC matrices that have exactly k nonzero elements per column. I decided to use cusparse for the job, but unfortunately I encounter that the matrix multiplication takes over 7 seconds in some cases, which is much slower than the CPU version of the code. (largest sparse matrix concerned is 19871x1000 largest dense matrix concerned is 1000*150, nnz = 101000). When trying to reproduce the problem in a self contained example, I always encounter an "illegal memory access error" when nnz != sparse_cols.After some investigation turns out that if I increase the size of matrices 10fold the problem disappears. If I make the matrices small enough I don't experience crashes. However with large matrices the sparse matrix has to not cross over some degree of denseness, otherwise multiplication produces a bunch of illegal memory accesses. Here is the code that exibits the problem:#include #include #include #include #define CALL_CUDA( err ) \{ if (err != cudaSuccess) \ {std::coutI believe i am generating a valid sparse matrix, because I can convert it to dense one using the appropariate cusparse function without triggering any invalid memory accesses.When running the above code under cuda-memcheck you can see many illegal accesses from within the cusparseScsrmm. Running without cuda-memcheck you would see an error in the first cuda operation after the matrix multiplication.Any ideas what I am doing wrong? I hope that if I can solve this problem, I would be able to diagnoze (or at least isolate) a self contained example that exhibits the painfully slow matrix multiplications.EDIT:Using smaller matrices I don't experience the problem. sparse matrix with 50*200 works fine for NNZ until about 1000, but takes forever with NNZ = 5000 (I killed it after half a minute). Increasing matrix size to 200*500 works performs instantaneously with NNZ = 5000.... Strange.EDIT2:The original number of nnz works if I increase the size of the matrices 10fold.

Add Comment