Skip to main content

Design

DataFusion Nexus is designed as a DataFusion-native integration layer, not a separate SQL engine boundary. DataFusion keeps SQL parsing, logical planning, catalog APIs, and CPU execution. Backend services can embed Nexus directly, or run the provided Flight SQL server, to add cuGraph table functions, Iceberg/lakehouse sources, structured diagnostics, and optional RAPIDS-backed execution for supported relational fragments.

The design center is differentiation through integration:

  • cuGraph as SQL relations. Graph algorithms are ordinary cugraph_* table functions with discoverable signatures, validation, and result schemas.
  • Embeddable backend API. Applications install Nexus on a DataFusion SessionStateBuilder and wrap it with their own service API, tenancy, auth, and domain-specific workflow.
  • Lakehouse-aware source semantics. Iceberg and object-store sources remain source catalogs; session-local views and edge relations live in a mutable DataFusion workspace.
  • Structured contracts. Planning reports, fallback reasons, metric rows, ErrorCodes, and error Facts are public diagnostic surfaces.
  • RAPIDS execution where it fits. Supported relational fragments lower into a DataFusion-free IR and execute through cuDF; unsupported fragments keep the DataFusion CPU path with a structured reason.

Nexus does not currently accept Substrait plans, and it does not implement single-query cross-GPU communication. The public design therefore avoids positioning Nexus as a generic GPU SQL engine. The strongest boundary is the DataFusion service/backend layer around RAPIDS, cuGraph, Iceberg, and typed runtime evidence.

Integration surfaces

Embedded DataFusion backend

The library surface is the primary extension point for custom backend services. CudfSessionStateBuilderExt installs Nexus behavior into a caller-owned DataFusion session:

let state = SessionStateBuilder::new()
.with_default_features()
.with_cudf_native(CudfOptimizerConfig::new(GpuFallbackPolicy::GpuPreferred))
.with_cugraph_sql(CugraphSqlConfig::default())
.build();

That shape lets a backend keep its own API, authorization, state model, and domain-specific validation while delegating SQL planning and supported RAPIDS execution to the DataFusion session.

Flight SQL service

The server packages the same session extensions behind Arrow Flight SQL. Its configuration controls cuGraph enablement, Iceberg catalogs, workspace overlay, memory policy, source credentials, and runtime diagnostics. This is the operable deployment shape for notebooks, BI tools, agents, and remote clients.

cuGraph SQL

With --features cugraph, graph algorithms surface as SQL table functions:

SELECT * FROM cugraph_pagerank('edges', 'src', 'dst')
ORDER BY value DESC;

The output is a plain relation — join it, filter it, window over it like any other table. Discovery and validation functions (cugraph_list_algorithms, cugraph_describe_algorithm, cugraph_validate_call) make the graph API self-describing for humans and agents.

Iceberg and workspace semantics

Iceberg catalogs are source catalogs. Interactive DDL such as CREATE VIEW belongs in a mutable DataFusion workspace, and an explicit workspace overlay can expose short source names without teaching a read-only Iceberg catalog to accept views. This keeps remote source metadata, session-local objects, and graph edge views separate.

Structured diagnostics

Every important boundary returns typed evidence: native candidate reports, strict/adaptive admission reports, source-read metric rows, graph reservation and runtime rows, and adapter errors with stable codes and facts. These contracts are how tests, tools, and backend services explain decisions; printed plans and logs are secondary.

Relational acceleration path

The relational GPU path is deliberately narrow and reportable. After DataFusion physical planning, Nexus recognizes supported subtrees, lowers them into nexus-query-engine::QueryPlan, and executes them through cuDF. Rejected subtrees stay on DataFusion's CPU path and carry a structured fallback reason.

click a stage for details
HOSTGPUadmittedrejectedUPSTREAMDataFusionSQL → physical planNEXUSoptimizer ruledatafusion_nexus_nativecapability admissionGPUNativeCudfExecQueryPlan on cuDFCPUoriginal operatorsfallback pathOUTPUTArrow batchesto caller
Relational acceleration path. The optimizer rule makes one gate-first decision per candidate subtree; admitted subtrees dip into the RAPIDS lane, rejected ones stay on the host — and both merge back into the same Arrow output.
DataFusion../datafusion (sibling checkout)
Standard DataFusion does all the front-end work: SQL parsing, logical planning, physical planning, and its own optimizer passes. Nexus registers one extra physical optimizer rule via SessionStateBuilder (src/session.rs) and leaves everything else untouched — any DataFusion client (SQL, DataFrame API, the Flight SQL server) gets the same Nexus admission and fallback behavior.

One optimizer rule, one decision point

There is a single entry point for relational cuDF rewrites: CudfNativePhysicalOptimizerRule (public rule name datafusion_nexus_native), registered as the last physical optimizer rule before DataFusion's sanity check. Three properties of its decision are load-bearing:

  • Admission is capability-first. A subtree is admitted only if every operator, expression, Arrow type, edge contract, and reservation contract it needs is individually supported. Rejections are local and structured — "this expression is missing", not "this query family is unsupported".
  • Every decision is reported. The rule emits structured PlanningReports with per-candidate Rewritten / Rejected outcomes and FallbackReason codes. Tests and tooling assert against these reports, never against printed plans.
  • Two execution modes, kept apart. Strict bounded execution is gate-first: every memory ReservationRequirement must be satisfiable before it runs. Adaptive execution admits more and accounts for memory actions at runtime. They have separate admission, evidence, and report surfaces (StrictBoundedContractReport vs AdaptivePipelineReport) — never merge them or project one mode's evidence into the other.

The layering

click a layer for details
UPSTREAMDataFusionSQL · planner · CPU operatorsADAPTERdatafusion-nexusoptimizer rule · lowering · exec wrappers · reportsENGINEnexus-query-engineQueryPlan IR · admission · executor — no DataFusion depBINDINGScudf-rs / cugraph-rssafe Rust over libcudf & libcugraph · one RuntimeFFI / C ABI boundaryNATIVElibcudf + libcugraph + RMMC++ / CUDA kernels · GPU memory pool
Everything above the dashed line is Rust; everything below is NVIDIA's C++/CUDA, reached only through the binding crates.
DataFusion../datafusion (sibling checkout)
The upstream query engine, pinned as a sibling path checkout. Nexus consumes its physical-plan and ExecutionPlan APIs and registers one optimizer rule; SQL, catalogs, and CPU operators are stock DataFusion.

The layering carries a strict placement rule:

  • DataFusion-specific code lives in the adapter (src/): plan recognition, lowering, ExecutionPlan wrappers, report surfaces.
  • Reusable native-engine code lives in nexus-query-engine: the QueryPlan IR, expression and source types, admission, and the executor. It compiles with no DataFusion dependency.
  • Missing GPU capability is fixed in the binding crates. cudf-rs and cugraph-rs are first-class foundations (pinned as sibling path checkouts, not crates.io dependencies). If native execution needs a better API, ownership model, or memory-control surface, extend the binding crate — don't paper over the gap in the adapter.

How cuGraph plugs in

With --features cugraph, graph algorithms surface as SQL table functions:

SELECT * FROM cugraph_pagerank('edges', 'src', 'dst')
ORDER BY value DESC;

The output is a plain relation — join it, filter it, window over it like any other table. Under the hood, one ExecutionPlan bridges relational and graph execution on a single shared GPU runtime:

click a stage for details
HOSTGPUone cudf::Runtime · one stream · one RMM poolborrowsSQLcugraph_*()table functionCugraphAlgoExecGPUcudf::DataFrameedge columns on deviceGRAPHcugraph Graphzero-copy buildalgorithm on GPUOUTPUTresults → Arrowone D2D copy
Edge data is handed to cuGraph zero-copy because dataframe execution and graph algorithms share one cudf::Runtime — one CUDA stream, one RMM allocator. Only the results take a copy back.
cugraph_*()src/graph_sql/
register_cugraph_udtfs wires 24 graph algorithms onto the SessionContext as table functions — cugraph_pagerank, cugraph_bfs, cugraph_louvain, …. Arguments are the edge table plus src/dst (and optional weight) columns and per-algorithm options. CugraphAlgoExec (src/graph_sql/exec.rs) is the DataFusion ExecutionPlan behind them: it executes the edge-source plan, collects its Arrow batches, and drives the GPU pipeline to the right.

The shared-runtime region is the load-bearing invariant: Nexus carries a single Arc<cudf::Runtime> across the whole session, so GPU buffers move between dataframe operators and graph algorithms without copies or cross-stream synchronization hazards. The exchange itself goes through rapids-interop, a small ownership-contract crate shared by cudf-rs and cugraph-rs — the two binding crates stay loosely coupled while still proving lifetime and stream safety at compile time.

What every developer must know

  • Reports are the contract. Report row schemas, TSV column ordering, NATIVE_*_TSV_HEADER / NATIVE_*_TSV_SCHEMA_VERSION, and NATIVE_CUDF_DATAFUSION_METRIC_NAMES are public surface — don't change them casually, and assert tests against reports, not printed plans.
  • The GPU is one external resource. Anything touching CUDA — tests, benchmarks, the server — must be serialized against other GPU users with flock /tmp/cudf-gpu.lock. Within one test invocation, the gpu nextest group caps GPU-test parallelism; see Build & Test.
  • Aggregate state is explicit. Aggregates with partial/final semantics model their state in the engine: AVG is partial SUM + COUNT, merged, then a final projection — never an average of averages. Lookalikes with unmodeled state stay rejected.
  • Errors stay structured. The adapter's closed Error / ErrorCode categories carry typed facts from the FFI boundary up to the reports; convert to DataFusionError only at the DataFusion API boundary.
  • Sibling checkouts are required. ../cudf-rs, ../datafusion, ../iceberg-rust (and ../cugraph-rs for --features cugraph) are pinned path dependencies; the workspace does not build without them.
  • Optional surfaces are feature-gated. Iceberg/Glue/S3 support is behind --features iceberg, the Flight SQL server behind --features server, and graph SQL behind --features cugraph.

Where to go next