Design of CUDA-based Louvain algorithm for community detection.
Community detection involves identifying natural divisions in networks, a crucial task for many large-scale applications. This report presents GVE-Louvain, one of the most efficient multicore implementations of the Louvain algorithm, a high-quality method for community detection. Running on a dual 16-core Intel Xeon Gold 6226R server, GVE-Louvain outperforms Vite, Grappolo, NetworKit Louvain, and cuGraph Louvain (on an NVIDIA A100 GPU) by factors of 50x
, 22x
, 20x
, and 5.8x
, respectively, achieving a processing rate of 560M
edges per second on a 3.8B
-edge graph. Additionally, it scales efficiently, improving performance by 1.6x
for every thread doubling. The paper also presents ν-Louvain, a GPU-based implementation. When evaluated on an NVIDIA A100 GPU, ν-Louvain performs only on par with GVE-Louvain, largely due to reduced workload and parallelism in later algorithmic passes. These results suggest that CPUs, with their flexibility in handling irregular workloads, may be better suited for community detection tasks.
Below we plot the time taken by Grappolo (Louvain), NetworKit Louvain, Nido (Louvain), cuGraph Louvain, and ν-Louvain on 13 different graphs. ν-Louvain surpasses Grappolo, NetworKit Louvain, Nido, and cuGraph Louvain by 20×
, 17×
, 61×
, and 5.0×
respectively, achieving a processing rate of 405M
edges/s on a 2.2𝐵
edge graph.
Below we plot the speedup of ν-Louvain wrt Grappolo, NetworKit Louvain, Nido, and cuGraph Louvain.
Finally, we plot the modularity of communities identified by Grappolo, NetworKit Louvain, Nido, cuGraph Louvain, and ν-Louvain. ν-Louvain on average obtains 1.1%
, 1.2%
, and 1.3%
lower modularity than Grappolo, NetworKit
Louvain, and cuGraph Louvain (where cuGraph Louvain runs), but 45%
higher modularity than Nido.
Refer to our technical report for more details:
CPU vs. GPU for Community Detection: Performance Insights from GVE-Louvain and ν-Louvain.
Note
You can just copy main.sh
to your system and run it.
For the code, refer to main.cxx
.
The code structure of ν-Louvain is as follows:
- inc/_*.hxx: Utility functions
- inc/**.hxx: Common graph functions
- inc/csr.hxx: Compressed Sparse Row (CSR) data structure functions
- inc/Graph.hxx: Graph data structure
- inc/louvian.hxx: OpenMP-based Louvian algorithm
- inc/louvainCuda.hxx: CUDA-based Louvian algorithm
- inc/hashtableCuda.hxx: CUDA-based hybrid quadratic-double hashtable
- inc/mtx.hxx: Graph (MTX format) file reader
- inc/properties.hxx: Graph property functions (e.g., modularity)
- inc/symmetrize.hxx: Make graph symmetric
- inc/update.hxx: Graph update functions
- inc/main.hxx: Main header
- main.cxx: Experimentation code
- main.sh: Experimentation script
- process.js: Node.js script for processing output logs
Note that each branch in this repository contains code for a specific experiment. The main
branch contains code for the final experiment. If the intention of a branch in unclear, or if you have comments on our technical report, feel free to open an issue.
- Fast unfolding of communities in large networks; Vincent D. Blondel et al. (2008)
- Community Detection on the GPU; Md. Naim et al. (2017)
- Scalable Static and Dynamic Community Detection Using Grappolo; Mahantesh Halappanavar et al. (2017)
- From Louvain to Leiden: guaranteeing well-connected communities; V.A. Traag et al. (2019)
- CS224W: Machine Learning with Graphs | Louvain Algorithm; Jure Leskovec (2021)
- The University of Florida Sparse Matrix Collection; Timothy A. Davis et al. (2011)
- Fetch-and-add using OpenMP atomic operations