Skip to main content

Strongly Connected Components

SQL function: cugraph_strongly_connected_components

Compute strongly connected components.

Signature

cugraph_strongly_connected_components(table_name [, src_col, dst_col [, weight_col [, options_json]]])

Allowed argument counts: 1, 3, 4, 5.

Quickstart

SELECT * FROM cugraph_strongly_connected_components('target_edges')

Positional arguments

ArgumentTypeRequiredDefaultNotes
table_nameUtf8yes
src_colUtf8nosrc
dst_colUtf8nodst
weight_colUtf8|nullnoaccepted as an edge-column binding; native algorithm execution does not consume weights; semantic effect: none for this algorithm
options_jsonUtf8no

JSON options

This algorithm has no algorithm-specific options.

Graph construction options

Shared by all cuGraph functions, shown here with this function's defaults. The construction_policy option controls whether Nexus requests Python cuGraph-compatible edge normalization or bypasses it for raw libcugraph-style construction; see graph construction options for the full policy guide.

OptionTypeDefaultConstraintsDescription
construction_policyUtf8"python_cugraph"one of "python_cugraph", "raw_libcugraph"Edge-list construction semantics used before calling libcugraph.
directedBooleantrueWhether graph construction treats edges as directed.
renumberBooleantrueWhether graph construction may renumber external vertex identifiers internally.

Output schema

ColumnTypeNullableDescription
vertexInt64noVertex assigned to a strongly connected component.
labelInt64noStrongly connected component identifier for the vertex.
note

These are the generic registry schemas. Run cugraph_validate_call for the concrete, table-specific output schema of a particular call.

Examples

These examples run on the citation network demo dataset.

Citation loops should not exist — count them anyway

A citation points backward in time, so a directed cycle of citations is a small anomaly: two or more papers that all cite each other. SCC finds every such loop in one pass. Snapshot the run in the mutable datafusion.public workspace (labels are per-execution), then census the loops:

-- Local workspace snapshot; this does not write to lake.citation_network.
CREATE TABLE scc_snapshot AS
SELECT vertex, label
FROM cugraph_strongly_connected_components('citation_edges', 'src', 'dst', NULL,
'{"directed":true}');

WITH loops AS (
SELECT label, COUNT(*) AS members FROM scc_snapshot
GROUP BY label HAVING COUNT(*) > 1)
SELECT COUNT(*) AS loops, MAX(members) AS biggest, SUM(members) AS papers_in_loops
FROM loops;
loopsbiggestpapers_in_loops
14,6131,370,0291,404,311

The surprise is the giant: 1.37M papers sit inside one strongly connected component. Preprint/journal double versions, simultaneous publication, and plain metadata error create enough forward-dated citations to glue a third of the graph into one loop — "time travel" is structural in bibliographic data.

Zoom into the two-paper loops

WITH pairs AS (
SELECT label FROM scc_snapshot GROUP BY label HAVING COUNT(*) = 2)
SELECT c.label, p.year, p.title
FROM pairs x
JOIN scc_snapshot c ON c.label = x.label
JOIN papers p ON p.paper_id = c.vertex
ORDER BY c.label
LIMIT 6;
labelyeartitle
572017Index Modulation Techniques for Next-Generation Wireless Networks
572017Multidimensional index modulation in wireless communications
2102017A Survey of Research into Mixed Criticality Systems
2102017Probabilistic analysis for mixed criticality systems using fixed priority…
2652018DC programming and DCA: thirty years of developments
2652018Convergence Analysis of Difference-of-Convex Algorithm with Subanalytic Data

12,015 of the 14,613 loops are exactly two papers — same-year companion papers and surveys citing each other, exactly what mutual citation looks like in practice.

Limitations & notes

  • dry-run validates table resolution, column presence, static dtypes, and options only
  • dry-run does not scan edge data, construct a graph, or prove source-vertex existence

Validate before running

Always dry-run a call before executing it. Validation checks the function, table, columns, dtypes, and options without touching the GPU:

SELECT * FROM cugraph_validate_call(
'cugraph_strongly_connected_components',
'your_edges_table',
'{"src_col":"src","dst_col":"dst"}'
);

See Discovery & validation for the full cugraph_validate_call contract.