PageRank
SQL function: cugraph_pagerank
Compute PageRank scores.
Signature
cugraph_pagerank(table_name [, src_col, dst_col [, weight_col [, options_json]]])
Allowed argument counts: 1, 3, 4, 5.
Quickstart
SELECT * FROM cugraph_pagerank('target_edges')
Positional arguments
| Argument | Type | Required | Default | Notes |
|---|---|---|---|---|
table_name | Utf8 | yes | ||
src_col | Utf8 | no | src | |
dst_col | Utf8 | no | dst | |
weight_col | Utf8|null | no | optional edge weight column for graph construction when supported by the algorithm; semantic effect: edge weights affect algorithm results when provided | |
options_json | Utf8 | no |
JSON options
| Option | Type | Default | Constraints | Description |
|---|---|---|---|---|
alpha | Float64 | 0.85 | min 0; max 1 | |
epsilon | Float64 | 0.00001 | > 0 | |
max_iterations | UInt32 | 100 | min 1 |
Graph construction options
Shared by all cuGraph functions, shown here with this function's defaults. The construction_policy option controls whether Nexus requests Python cuGraph-compatible edge normalization or bypasses it for raw libcugraph-style construction; see graph construction options for the full policy guide.
| Option | Type | Default | Constraints | Description |
|---|---|---|---|---|
construction_policy | Utf8 | "python_cugraph" | one of "python_cugraph", "raw_libcugraph" | Edge-list construction semantics used before calling libcugraph. |
directed | Boolean | true | Whether graph construction treats edges as directed. | |
renumber | Boolean | true | Whether graph construction may renumber external vertex identifiers internally. |
Output schema
| Column | Type | Nullable | Description |
|---|---|---|---|
vertex | Int64 | no | Vertex receiving the PageRank score. |
value | Float64 | no | PageRank score for the vertex. |
These are the generic registry schemas. Run cugraph_validate_call for the concrete, table-specific output schema of a particular call.
Examples
These examples run on the citation network demo dataset
(4.9M papers, 45.6M src-cites-dst edges).
What raw citation counts miss
PageRank over the full graph, joined back to paper metadata. Importance flows through citations: a paper cited by foundational papers outranks a paper with more, but shallower, citations.
SELECT p.title, p.year, p.n_citation, r.value AS pagerank
FROM cugraph_pagerank('citation_edges', 'src', 'dst') r
JOIN papers p ON p.paper_id = r.vertex
ORDER BY r.value DESC
LIMIT 5;
| title | year | n_citation | pagerank |
|---|---|---|---|
| Finite automata and their decision problems | 1959 | 1,401 | 0.00139 |
| The Mathematical Theory of Communication | 1949 | 48,327 | 0.00134 |
| The reduction of two-way automata to one-way automata | 1959 | 224 | 0.00126 |
| The complexity of theorem-proving procedures | 1971 | 4,592 | 0.00073 |
| A mathematical theory of communication | 1948 | 22,122 | 0.00066 |
The #1 and #3 papers have modest raw counts (1,401 and 224 citations) — but the papers citing them are themselves the roots of computer science, and PageRank propagates exactly that. The full call — 45.6M edges, GPU graph build, 4.1M scores, join, sort — returns in about 1.5 s.
A window function over the result: PageRank ranks its own paper
The output of a cugraph_* function is a plain relation, so ROW_NUMBER()
works directly on it. Where does the paper that introduced PageRank land, by
its own algorithm, among 4.9 million papers?
WITH ranked AS (
SELECT vertex, value, ROW_NUMBER() OVER (ORDER BY value DESC) AS rank
FROM cugraph_pagerank('citation_edges', 'src', 'dst'))
SELECT r.rank, p.title, p.year, r.value
FROM ranked r JOIN papers p ON p.paper_id = r.vertex
WHERE r.vertex = 2066636486;
| rank | title | year | value |
|---|---|---|---|
| 115 | The anatomy of a large-scale hypertextual Web search engine | 1998 | 0.000143 |
Rank #115 of 4,894,081.
SQL decides which graph the GPU sees: the pre-2000 canon
The first argument is any relation name — including a view. Joining the edge
list to papers on both endpoints restricts the graph to citations that
stay within an era, and PageRank then answers "what was the canon before
2000?".
CREATE VIEW edges_pre2000 AS
SELECT e.src, e.dst
FROM citation_edges e
JOIN papers ps ON ps.paper_id = e.src
JOIN papers pd ON pd.paper_id = e.dst
WHERE ps.year BETWEEN 1901 AND 2000 AND pd.year BETWEEN 1901 AND 2000;
SELECT p.year, p.title
FROM cugraph_pagerank('edges_pre2000', 'src', 'dst') r
JOIN papers p ON p.paper_id = r.vertex
ORDER BY r.value DESC
LIMIT 6;
| year | title |
|---|---|
| 1959 | Finite automata and their decision problems |
| 1959 | The reduction of two-way automata to one-way automata |
| 1949 | The Mathematical Theory of Communication |
| 1974 | The Design and Analysis of Computer Algorithms |
| 1958 | Preliminary report: international algebraic language |
| 1963 | Machine perception of three-dimensional solids |
Automata theory, Shannon, Aho–Hopcroft–Ullman, the ALGOL report: the view's
WHERE clause rewinds the clock and the algorithm re-ranks the field.
Limitations & notes
- dry-run validates table resolution, column presence, static dtypes, and options only
- dry-run does not scan edge data, construct a graph, or prove source-vertex existence
Validate before running
Always dry-run a call before executing it. Validation checks the function, table, columns, dtypes, and options without touching the GPU:
SELECT * FROM cugraph_validate_call(
'cugraph_pagerank',
'your_edges_table',
'{"src_col":"src","dst_col":"dst"}'
);
See Discovery & validation for the full cugraph_validate_call contract.