HSMU-SpGEMM: Achieving High Shared Memory Utilization for Parallel Sparse General Matrix-Matrix Multiplication on Modern GPUs

Wu, Min, Luo, Huizhang, Li, Fenfang, Zhang, Yiran, Tang, Zhuo, Li, Kenli, Zhang, Jeff, Liu, Chubo

March 2025

Abstract

Sparse general matrix-matrix multiplication (SpGEMM) is a core primitive for numerous scientific applications. Traditional hash-based approaches fail to strike a balance between reducing hash collisions and efficiently utilizing fast shared memory, which significantly undermines the performance of executing SpGEMM on GPUs. To address this issue, this paper introduces a novel accumulator design that achieves high shared memory utilization on modern GPUs. For the proposed high shared memory utilization algorithm, i.e., HSMU-SpGEMM1, we further optimize different symbolic stages. Our evaluations with four state-of-the-art hash-based SpGEMM libraries (Nsparse, spECK, OpSparse, and NVIDIA’s cuSPARSE) on three NVIDIA GPUs (Ampere, Ada Lovelace, Turing) demonstrate significant performance benefits from HSMU-SpGEMM.1HSMU-SpGEMM is available at https://github.com/wuminqaq/HSMUSpGEMM

Type

Conference paper

Publication

In 2025 IEEE International Symposium on High Performance Computer Architecture (HPCA)