PageRank

SQL function: cugraph_pagerank

Compute PageRank scores.

Signature

cugraph_pagerank(table_name [, src_col, dst_col [, weight_col [, options_json]]])

Allowed argument counts: 1, 3, 4, 5.

Quickstart

SELECT * FROM cugraph_pagerank('target_edges')

Positional arguments

Argument	Type	Required	Default	Notes
`table_name`	`Utf8`	yes
`src_col`	`Utf8`	no	`src`
`dst_col`	`Utf8`	no	`dst`
`weight_col`	`Utf8\|null`	no		optional edge weight column for graph construction when supported by the algorithm; semantic effect: edge weights affect algorithm results when provided
`options_json`	`Utf8`	no

JSON options

Option	Type	Default	Constraints
`alpha`	`Float64`	`0.85`	min 0; max 1
`epsilon`	`Float64`	`0.00001`	> 0
`max_iterations`	`UInt32`	`100`	min 1

Graph construction options

Shared by all cuGraph functions, shown here with this function's defaults. The construction_policy option controls whether Nexus requests Python cuGraph-compatible edge normalization or bypasses it for raw libcugraph-style construction; see graph construction options for the full policy guide.

Option	Type	Default	Constraints	Description
`construction_policy`	`Utf8`	`"python_cugraph"`	one of `"python_cugraph"`, `"raw_libcugraph"`	Edge-list construction semantics used before calling libcugraph.
`directed`	`Boolean`	`true`		Whether graph construction treats edges as directed.
`renumber`	`Boolean`	`true`		Whether graph construction may renumber external vertex identifiers internally.

Output schema

Column	Type	Nullable	Description
`vertex`	`Int64`	no	Vertex receiving the PageRank score.
`value`	`Float64`	no	PageRank score for the vertex.

note

These are the generic registry schemas. Run cugraph_validate_call for the concrete, table-specific output schema of a particular call.

Examples

These examples run on the citation network demo dataset (4.9M papers, 45.6M src-cites-dst edges).

What raw citation counts miss

PageRank over the full graph, joined back to paper metadata. Importance flows through citations: a paper cited by foundational papers outranks a paper with more, but shallower, citations.

SELECT p.title, p.year, p.n_citation, r.value AS pagerank
FROM cugraph_pagerank('citation_edges', 'src', 'dst') r
JOIN papers p ON p.paper_id = r.vertex
ORDER BY r.value DESC
LIMIT 5;

title	year	n_citation	pagerank
Finite automata and their decision problems	1959	1,401	0.00139
The Mathematical Theory of Communication	1949	48,327	0.00134
The reduction of two-way automata to one-way automata	1959	224	0.00126
The complexity of theorem-proving procedures	1971	4,592	0.00073
A mathematical theory of communication	1948	22,122	0.00066

The #1 and #3 papers have modest raw counts (1,401 and 224 citations) — but the papers citing them are themselves the roots of computer science, and PageRank propagates exactly that. The full call — 45.6M edges, GPU graph build, 4.1M scores, join, sort — returns in about 1.5 s.

A window function over the result: PageRank ranks its own paper

The output of a cugraph_* function is a plain relation, so ROW_NUMBER() works directly on it. Where does the paper that introduced PageRank land, by its own algorithm, among 4.9 million papers?

WITH ranked AS (
  SELECT vertex, value, ROW_NUMBER() OVER (ORDER BY value DESC) AS rank
  FROM cugraph_pagerank('citation_edges', 'src', 'dst'))
SELECT r.rank, p.title, p.year, r.value
FROM ranked r JOIN papers p ON p.paper_id = r.vertex
WHERE r.vertex = 2066636486;

rank	title	year	value
115	The anatomy of a large-scale hypertextual Web search engine	1998	0.000143

Rank #115 of 4,894,081.

SQL decides which graph the GPU sees: the pre-2000 canon

The first argument is any relation name — including a view. Joining the edge list to papers on both endpoints restricts the graph to citations that stay within an era, and PageRank then answers "what was the canon before 2000?".

CREATE VIEW edges_pre2000 AS
SELECT e.src, e.dst
FROM citation_edges e
JOIN papers ps ON ps.paper_id = e.src
JOIN papers pd ON pd.paper_id = e.dst
WHERE ps.year BETWEEN 1901 AND 2000 AND pd.year BETWEEN 1901 AND 2000;

SELECT p.year, p.title
FROM cugraph_pagerank('edges_pre2000', 'src', 'dst') r
JOIN papers p ON p.paper_id = r.vertex
ORDER BY r.value DESC
LIMIT 6;

year	title
1959	Finite automata and their decision problems
1959	The reduction of two-way automata to one-way automata
1949	The Mathematical Theory of Communication
1974	The Design and Analysis of Computer Algorithms
1958	Preliminary report: international algebraic language
1963	Machine perception of three-dimensional solids

Automata theory, Shannon, Aho–Hopcroft–Ullman, the ALGOL report: the view's WHERE clause rewinds the clock and the algorithm re-ranks the field.

Limitations & notes

dry-run validates table resolution, column presence, static dtypes, and options only
dry-run does not scan edge data, construct a graph, or prove source-vertex existence

Validate before running

Always dry-run a call before executing it. Validation checks the function, table, columns, dtypes, and options without touching the GPU:

SELECT * FROM cugraph_validate_call(
  'cugraph_pagerank',
  'your_edges_table',
  '{"src_col":"src","dst_col":"dst"}'
);

See Discovery & validation for the full cugraph_validate_call contract.

Signature​

Quickstart​

Positional arguments​

JSON options​

Graph construction options​

Output schema​

Examples​

What raw citation counts miss​

A window function over the result: PageRank ranks its own paper​

SQL decides which graph the GPU sees: the pre-2000 canon​

Limitations & notes​

Validate before running​