Skip to main content

Louvain

SQL function: cugraph_louvain

Detect communities with Louvain modularity optimization.

Signature

cugraph_louvain(table_name [, src_col, dst_col [, weight_col [, options_json]]])

Allowed argument counts: 1, 3, 4, 5.

Quickstart

SELECT * FROM cugraph_louvain('target_edges')

Positional arguments

ArgumentTypeRequiredDefaultNotes
table_nameUtf8yes
src_colUtf8nosrc
dst_colUtf8nodst
weight_colUtf8|nullnoaccepted as an edge-column binding; native algorithm execution does not consume weights; semantic effect: none for this algorithm
options_jsonUtf8no

JSON options

OptionTypeDefaultConstraintsDescription
max_levelUInt32100min 1
resolutionFloat641> 0
thresholdFloat641e-7min 0

Graph construction options

Shared by all cuGraph functions, shown here with this function's defaults. The construction_policy option controls whether Nexus requests Python cuGraph-compatible edge normalization or bypasses it for raw libcugraph-style construction; see graph construction options for the full policy guide.

OptionTypeDefaultConstraintsDescription
construction_policyUtf8"python_cugraph"one of "python_cugraph", "raw_libcugraph"Edge-list construction semantics used before calling libcugraph.
directedBooleantrueWhether graph construction treats edges as directed.
renumberBooleantrueWhether graph construction may renumber external vertex identifiers internally.

Output schema

ColumnTypeNullableDescription
vertexInt64noVertex assigned to a Louvain community.
partitionInt64noCommunity identifier assigned by Louvain.
note

These are the generic registry schemas. Run cugraph_validate_call for the concrete, table-specific output schema of a particular call.

Examples

This example runs on the citation network demo dataset.

Do citation communities match the field labels?

Two views carve out the 2010s AI literature — nodes by field-of-study label, edges where both endpoints qualify — and Louvain partitions it purely by citation structure. Cross-tabulating each community against primary_fos (with a window function to keep the top 3 labels per community) shows how well the labels survive contact with the actual graph:

CREATE VIEW ai_nodes AS
SELECT paper_id FROM papers
WHERE year >= 2010 AND primary_fos IN (
'Deep learning', 'Artificial neural network', 'Convolutional neural network',
'Recurrent neural network', 'Natural language processing',
'Reinforcement learning', 'Image segmentation', 'Feature extraction',
'Object detection', 'Speech recognition');

CREATE VIEW ai_edges AS
SELECT e.src, e.dst
FROM citation_edges e
JOIN ai_nodes a ON a.paper_id = e.src
JOIN ai_nodes b ON b.paper_id = e.dst;

WITH community_fos AS (
SELECT c."partition" AS community, p.primary_fos, COUNT(*) AS n
FROM cugraph_louvain('ai_edges', 'src', 'dst') c
JOIN papers p ON p.paper_id = c.vertex
GROUP BY c."partition", p.primary_fos),
ranked AS (
SELECT SUM(n) OVER (PARTITION BY community) AS members,
ROW_NUMBER() OVER (PARTITION BY community ORDER BY n DESC) AS rn,
primary_fos,
n
FROM community_fos)
SELECT members, rn, primary_fos, n
FROM ranked
WHERE members > 2500 AND rn <= 3
ORDER BY members DESC, rn;
membersrnprimary_fosn
10,6841Convolutional neural network3,002
10,6842Object detection2,467
10,6843Deep learning2,247
6,7071Deep learning2,168
6,7072Convolutional neural network2,061
6,7073Feature extraction778
5,1181Deep learning1,591
5,1182Recurrent neural network1,289
5,1183Convolutional neural network714
3,9601Reinforcement learning3,157
3,9602Artificial neural network337
3,9603Deep learning204

Louvain (1,960 communities over 38k papers) recovers real subfield borders: a computer-vision community, a sequence-modeling community — and a reinforcement-learning community that is 80% one label. Note the quoted "partition" — the output column name is a SQL keyword. cugraph_leiden is a drop-in replacement in the same shape.

Limitations & notes

  • dry-run validates table resolution, column presence, static dtypes, and options only
  • dry-run does not scan edge data, construct a graph, or prove source-vertex existence

Validate before running

Always dry-run a call before executing it. Validation checks the function, table, columns, dtypes, and options without touching the GPU:

SELECT * FROM cugraph_validate_call(
'cugraph_louvain',
'your_edges_table',
'{"src_col":"src","dst_col":"dst"}'
);

See Discovery & validation for the full cugraph_validate_call contract.