Design
DataFusion Nexus is designed as a DataFusion-native integration layer, not a separate SQL engine boundary. DataFusion keeps SQL parsing, logical planning, catalog APIs, and CPU execution. Backend services can embed Nexus directly, or run the provided Flight SQL server, to add cuGraph table functions, Iceberg/lakehouse sources, structured diagnostics, and optional RAPIDS-backed execution for supported relational fragments.
The design center is differentiation through integration:
- cuGraph as SQL relations. Graph algorithms are ordinary
cugraph_*table functions with discoverable signatures, validation, and result schemas. - Embeddable backend API. Applications install Nexus on a DataFusion
SessionStateBuilderand wrap it with their own service API, tenancy, auth, and domain-specific workflow. - Lakehouse-aware source semantics. Iceberg and object-store sources remain source catalogs; session-local views and edge relations live in a mutable DataFusion workspace.
- Structured contracts. Planning reports, fallback reasons, metric rows,
ErrorCodes, and errorFactsare public diagnostic surfaces. - RAPIDS execution where it fits. Supported relational fragments lower into a DataFusion-free IR and execute through cuDF; unsupported fragments keep the DataFusion CPU path with a structured reason.
Nexus does not currently accept Substrait plans, and it does not implement single-query cross-GPU communication. The public design therefore avoids positioning Nexus as a generic GPU SQL engine. The strongest boundary is the DataFusion service/backend layer around RAPIDS, cuGraph, Iceberg, and typed runtime evidence.
Integration surfaces
Embedded DataFusion backend
The library surface is the primary extension point for custom backend services.
CudfSessionStateBuilderExt installs Nexus behavior into a caller-owned
DataFusion session:
let state = SessionStateBuilder::new()
.with_default_features()
.with_cudf_native(CudfOptimizerConfig::new(GpuFallbackPolicy::GpuPreferred))
.with_cugraph_sql(CugraphSqlConfig::default())
.build();
That shape lets a backend keep its own API, authorization, state model, and domain-specific validation while delegating SQL planning and supported RAPIDS execution to the DataFusion session.
Flight SQL service
The server packages the same session extensions behind Arrow Flight SQL. Its configuration controls cuGraph enablement, Iceberg catalogs, workspace overlay, memory policy, source credentials, and runtime diagnostics. This is the operable deployment shape for notebooks, BI tools, agents, and remote clients.
cuGraph SQL
With --features cugraph, graph algorithms surface as SQL table functions:
SELECT * FROM cugraph_pagerank('edges', 'src', 'dst')
ORDER BY value DESC;
The output is a plain relation — join it, filter it, window over it like any
other table. Discovery and validation functions (cugraph_list_algorithms,
cugraph_describe_algorithm, cugraph_validate_call) make the graph API
self-describing for humans and agents.
Iceberg and workspace semantics
Iceberg catalogs are source catalogs. Interactive DDL such as CREATE VIEW
belongs in a mutable DataFusion workspace, and an explicit workspace overlay can
expose short source names without teaching a read-only Iceberg catalog to accept
views. This keeps remote source metadata, session-local objects, and graph edge
views separate.
Structured diagnostics
Every important boundary returns typed evidence: native candidate reports, strict/adaptive admission reports, source-read metric rows, graph reservation and runtime rows, and adapter errors with stable codes and facts. These contracts are how tests, tools, and backend services explain decisions; printed plans and logs are secondary.
Relational acceleration path
The relational GPU path is deliberately narrow and reportable. After DataFusion
physical planning, Nexus recognizes supported subtrees, lowers them into
nexus-query-engine::QueryPlan, and executes them through cuDF. Rejected
subtrees stay on DataFusion's CPU path and carry a structured fallback reason.
SessionStateBuilder (src/session.rs) and leaves everything else untouched — any DataFusion client (SQL, DataFrame API, the Flight SQL server) gets the same Nexus admission and fallback behavior.One optimizer rule, one decision point
There is a single entry point for relational cuDF rewrites:
CudfNativePhysicalOptimizerRule (public rule name
datafusion_nexus_native), registered as the last physical optimizer rule
before DataFusion's sanity check. Three properties of its decision are
load-bearing:
- Admission is capability-first. A subtree is admitted only if every operator, expression, Arrow type, edge contract, and reservation contract it needs is individually supported. Rejections are local and structured — "this expression is missing", not "this query family is unsupported".
- Every decision is reported. The rule emits structured
PlanningReports with per-candidateRewritten/Rejectedoutcomes andFallbackReasoncodes. Tests and tooling assert against these reports, never against printed plans. - Two execution modes, kept apart. Strict bounded execution is
gate-first: every memory
ReservationRequirementmust be satisfiable before it runs. Adaptive execution admits more and accounts for memory actions at runtime. They have separate admission, evidence, and report surfaces (StrictBoundedContractReportvsAdaptivePipelineReport) — never merge them or project one mode's evidence into the other.
The layering
ExecutionPlan APIs and registers one optimizer rule; SQL, catalogs, and CPU operators are stock DataFusion.The layering carries a strict placement rule:
- DataFusion-specific code lives in the adapter (
src/): plan recognition, lowering,ExecutionPlanwrappers, report surfaces. - Reusable native-engine code lives in
nexus-query-engine: theQueryPlanIR, expression and source types, admission, and the executor. It compiles with no DataFusion dependency. - Missing GPU capability is fixed in the binding crates.
cudf-rsandcugraph-rsare first-class foundations (pinned as sibling path checkouts, not crates.io dependencies). If native execution needs a better API, ownership model, or memory-control surface, extend the binding crate — don't paper over the gap in the adapter.
How cuGraph plugs in
With --features cugraph, graph algorithms surface as SQL table functions:
SELECT * FROM cugraph_pagerank('edges', 'src', 'dst')
ORDER BY value DESC;
The output is a plain relation — join it, filter it, window over it like any
other table. Under the hood, one ExecutionPlan bridges relational and graph
execution on a single shared GPU runtime:
cudf::Runtime — one CUDA stream, one RMM allocator. Only the results take a copy back.register_cugraph_udtfs wires 24 graph algorithms onto the SessionContext as table functions — cugraph_pagerank, cugraph_bfs, cugraph_louvain, …. Arguments are the edge table plus src/dst (and optional weight) columns and per-algorithm options. CugraphAlgoExec (src/graph_sql/exec.rs) is the DataFusion ExecutionPlan behind them: it executes the edge-source plan, collects its Arrow batches, and drives the GPU pipeline to the right.The shared-runtime region is the load-bearing invariant: Nexus carries a
single Arc<cudf::Runtime> across the whole session, so GPU buffers move
between dataframe operators and graph algorithms without copies or
cross-stream synchronization hazards. The exchange itself goes through
rapids-interop, a small ownership-contract crate shared by cudf-rs and
cugraph-rs — the two binding crates stay loosely coupled while still proving
lifetime and stream safety at compile time.
What every developer must know
- Reports are the contract. Report row schemas, TSV column ordering,
NATIVE_*_TSV_HEADER/NATIVE_*_TSV_SCHEMA_VERSION, andNATIVE_CUDF_DATAFUSION_METRIC_NAMESare public surface — don't change them casually, and assert tests against reports, not printed plans. - The GPU is one external resource. Anything touching CUDA — tests,
benchmarks, the server — must be serialized against other GPU users with
flock /tmp/cudf-gpu.lock. Within one test invocation, thegpunextest group caps GPU-test parallelism; see Build & Test. - Aggregate state is explicit. Aggregates with partial/final semantics model their state in the engine: AVG is partial SUM + COUNT, merged, then a final projection — never an average of averages. Lookalikes with unmodeled state stay rejected.
- Errors stay structured. The adapter's closed
Error/ErrorCodecategories carry typed facts from the FFI boundary up to the reports; convert toDataFusionErroronly at the DataFusion API boundary. - Sibling checkouts are required.
../cudf-rs,../datafusion,../iceberg-rust(and../cugraph-rsfor--features cugraph) are pinned path dependencies; the workspace does not build without them. - Optional surfaces are feature-gated. Iceberg/Glue/S3 support is behind
--features iceberg, the Flight SQL server behind--features server, and graph SQL behind--features cugraph.
Where to go next
- Build & Test — commands, toolchain, GPU test lanes.
- cuGraph SQL API — the full
cugraph_*function reference. - Guides — benchmark recipes and local E2E flows.
- Public surface — the contracts listed above, in detail.