Why MLIR? PhoebeDB's Case for Multi-Level IR in a Query Compiler

By Boge Liu

Compiled query execution is a solved problem. HyPer proved in 2011 that compiling query plans to native code beats interpretation by an order of magnitude on analytical workloads. SingleStore productionized it. SAP HANA built a business on it. The question for a new database engine isn't whether to compile SQL — it's what IR to build the compiler on.

Every major compiled database made the same choice: go straight to LLVM IR. HyPer does it. Umbra does it. NoisePage does it. LLVM IR is mature, well-optimized, and gets you to x86-64 with minimal effort. So why does PhoebeDB use MLIR instead, layering three custom IR levels between the query plan and LLVM?

Because LLVM IR is the wrong abstraction level for a query compiler's optimization work — even if it's the right level for code generation.

The problem with going straight to LLVM IR

A query plan is a data-flow graph of typed tuple streams: a scan feeds a filter, a filter feeds an aggregation, an aggregation feeds a sort. Optimization decisions — which operators can be fused into a single loop, where pipeline boundaries must be placed, whether to use row-at-a-time or batch processing — all require reasoning about that graph structure.

When you compile a query plan directly to LLVM IR, that structure disappears. LLVM IR is flat and scalar: there are no tuple streams, no operator boundaries, no way to express "this op must consume a batch." Every optimization pass has to happen before LLVM IR, in ad-hoc C++ that cannot be verified, composed, or reused across queries. The IR becomes pure output — write-only, opaque to further structured analysis.

The consequence is that adding a new optimization or a new operator type requires touching the entire code-generation path. There is no intermediate representation where you can insert a rewrite, no type system that enforces "a batch-producing op may not feed a row-at-a-time sink," no structured way to test that a lowering pass is correct.

MLIR was built to solve exactly this. It lets you define your abstraction levels as first-class dialects, with their own types, ops, and verifiers. Passes reason about one level at a time. Lowering from one dialect to the next is a structured, composable transformation. The plan-level structure is preserved in IR form all the way down until you choose to discard it.

Evolution, not revolution — why PhoebeDB starts from a PostgreSQL query plan

Before the MLIR pipeline runs, PhoebeDB makes one more consequential design decision: it lets PostgreSQL do the parsing, analysis, and planning. The compiler only takes over once there is already a finished query plan in hand.

This is deliberate. Building a production DBMS end-to-end is a multi-decade project, and two pieces are notoriously expensive to build from scratch:

Full SQL-standard coverage. PostgreSQL has spent 30+ years growing support for the long tail of SQL — CTEs, lateral joins, window functions and frames, subquery decorrelation, array and JSON types, triggers, rules, foreign data wrappers, PL/pgSQL, inheritance, MERGE, RETURNING, every corner of three-valued-logic NULL semantics. Reimplementing that surface is not a side project.
A cost-based optimizer. A good optimizer is not just an algorithm — it is years of accumulated domain judgment about selectivity estimation, join-order heuristics, the statistical shape of real data, plan stability under parameter changes, and thousands of regression cases. That institutional knowledge lives in the PostgreSQL planner and cannot be rebuilt cheaply.

So PhoebeDB treats those layers as infrastructure to reuse rather than reinvent. The query plan is the contract between PostgreSQL and the compiler. Everything this post describes — the three IR layers, the pipeline slicer, the compilation worker — sits downstream of that contract. The payoff is PostgreSQL compatibility almost for free, with all engineering budget spent on the one differentiator: turning that plan into native code.

Three IR layers — what each one buys

With MLIR, PhoebeDB defines three custom IR layers, each capturing one semantic level:

Layer	Abstraction
High-level IR	Query plan data flow — tuple streams, column streams, logical operators
Mid-level IR	Executor operators with explicit loops, iteration state, and batch structures
Low-level IR	Concrete runtime calls, MVCC logic, and LLVM-compatible types

The key property: each layer has its own type system and verifier. Optimization passes written at the high level cannot accidentally produce mid-level constructs, and the verifier rejects them if they try. This is what MLIR provides that going straight to LLVM IR does not.

High-level IR — the plan as a data-flow graph

The top layer mirrors the logical query plan. There are no loops, no iterators, no runtime structures — just "this stream flows into that operator." The core abstraction is the tuple stream; everything else is either a parameter or a handle into the runtime request context.

This is the layer where plan-shape decisions live: identifying pipelines, fusing projections, deciding where breakers must be placed. Because the plan structure is still present in IR form, these passes can be written, tested, and composed independently — no different in principle from writing an LLVM optimization pass, but operating at query-plan granularity instead of scalar instruction granularity.

To make this concrete, here is what pipeline 1 of a GROUP BY query looks like at this level (dialect prefix anonymized):

// SELECT k, SUM(v) FROM t WHERE v > 0 GROUP BY k ORDER BY k LIMIT 10
// pipeline 1: scan → filter → hash-aggregate build
func.func @pipeline_1(%ctx: !pb.query_ctx) -> i1 {
  %rows = pb.seqScan %ctx { table = "t", cols = ["k", "v"] }
              : !pb.stream
  %filt = pb.filter %rows { predicate = "v > 0" }
              : !pb.stream
  %done = pb.hashAggSink %filt { group_by = ["k"], aggregates = ["sum(v)"] }
              : i1
  return %done : i1
}

Each SSA value is a stream — %rows, %filt — and each op consumes the previous stream and produces the next. There are no loops and no memory writes visible here; those only appear after lowering to the mid level. The hash-aggregate sink is explicit as a pipeline-terminating op, which is exactly where the slicer placed the breaker boundary. Critically, a verifier at this level can confirm that pb.hashAggSink receives a !pb.stream — no stranded values, no type mismatches — before a single line of loop code is generated.

Mid-level IR — loops and iterators

Lowering to the mid level opens the streams into explicit loops. Every tuple stream becomes a scan cursor with a batch buffer. Executor operators — scan begin/next, tuple construction, update, aggregation — operate on runtime-shaped structures. The MVCC retry-on-conflict loop is visible at this level but still abstract: there are no concrete function pointers yet, and the type system still enforces that batch-producing operators feed batch-consuming sinks and not row-at-a-time ones.

This separation matters precisely because it lets loop-level optimizations — loop unrolling hints, predicate hoisting, batch-size decisions — be written as passes that understand executor semantics without needing to know anything about the storage engine below.

Low-level IR — runtime calls

The low level is the last stop before LLVM. Generic operators split into type-specialized ones (integer, double, varchar variants). Table handles become LLVM pointers. Scan steps turn into concrete external calls into PhoebeDB's storage engine. From here, a lowering pass hands off to LLVM IR, and MLIR's ExecutionEngine JITs it to x86-64 via LLVM ORC.

The deliberate tradeoff: PhoebeDB does not inline storage-engine internals into MLIR. Tuple construction, buffer allocation, MVCC visibility — all still go through C++ external calls, because the storage engine uses C++ allocators and containers that MLIR cannot model cleanly. That is an explicit current limitation, and it is where most of the remaining interpreter cost lives.

Pipeline slicing — compile one tight loop per pipeline

Before any IR is emitted, PhoebeDB slices the query plan into pipelines, each compiled into its own function. This is where the high-level IR's plan-shape visibility pays off directly: the slicer operates on the structured IR, not on raw LLVM bitcode.

What a pipeline is

A pipeline is a maximal chain of operators through which a single tuple (or a small batch) can flow from its source to a sink without ever being written to main memory. Scans, filters, projections, index lookups, and hash-join probe sides are pipelineable — they consume an input tuple, do their work, and hand the result immediately to the next operator.

A pipeline breaker is an operator that cannot emit a single output tuple until it has fully consumed its input: Sort, the build side of a hash join, the partial-aggregation step of a hash aggregate. The input has to land somewhere — a sort run, a hash table, a spill file — before downstream work can start.

The motivation is cache locality. Inside a pipeline, each operator consumes a value the previous operator just produced, while that value is still in registers or L1. Once a breaker forces materialization, the tuple falls back to main memory. The unit of "keep hot" is the pipeline.

How the plan is sliced

The slicer walks the query plan and every time it crosses a breaker, it cuts. Each cut inserts two synthetic nodes: a sink at the tail of the upstream pipeline, and a source at the head of the downstream pipeline. The sink and source share a reference to the same intermediate structure, allocated once and handed to both.

Take:

SELECT k, SUM(v) FROM t WHERE v > 0 GROUP BY k ORDER BY k LIMIT 10;

Slicing splits this into three pipelines around the two breakers (hash-aggregate build, sort):

pipeline 1:  scan(t) ── filter(v>0) ── hashAggSink(by k)
pipeline 2:  hashAggSource ── sortSink(by k)
pipeline 3:  sortSource ── limit(10) ── output

Fusion inside each pipeline

Once a pipeline has its own function, lowering fuses all of that pipeline's operators into a single loop. A chain like scan → filter → project → hashAggSink becomes one loop body that executes the filter predicate, the projection, and the sink write back-to-back on the same tuple — no next() boundary, no virtual dispatch, no intermediate buffer between operators.

Two consequences fall out of this design:

Materialization is explicit and localized. The only places tuples land in memory between operators are the breakers we sliced on, and the compiler knows exactly where those are.
Adding operators is local. A new pipelineable operator is another shape inside the fused loop. A new pipeline-breaking operator is a sink/source pair plus a lowering pattern; the slicer handles the plan surgery.

End-to-end: from `psql` to native code

The full lowering sequence is: query plan → high-level IR → mid-level IR → low-level IR → LLVM IR → JITed machine code. The compiled function has one uniform signature regardless of query shape: a single pointer to a runtime request struct (holding parameters, result buffer, MVCC snapshot, and anything else the compiled code needs from its caller) in, and a boolean status out. That uniform entry point is what makes caching and dispatch tractable.

The compilation worker — compile once, call many

JIT only pays off if the same query runs more than once, so PhoebeDB runs compilation asynchronously on a dedicated worker thread:

A lock-protected task queue holds pending compilation requests keyed by query ID.
The worker pops tasks, runs the full lowering pipeline, and registers the resulting function pointer in a code cache.
Subsequent executions of the same query ID short-circuit straight to the cached binary.

Dispatch happens through a PostgreSQL CustomScan node. On the non-prepared path, the planner runs, submits the plan to the compilation queue, and returns a CustomScan plan node. When the executor runs it, it calls the compiled function if the binary is ready, or falls back to PostgreSQL's interpreter for that execution if compilation is still in progress. The next execution hits the cache.

Prepared statements get a cleaner loop. PostgreSQL already caches the plan, so on the hot path the planner is skipped entirely — the query ID is resolved, the cached binary is located, and execution goes straight to native code.

Where this sits in the DBMS landscape

Compiled execution is not new — HyPer pioneered it, SingleStore productionized it, SAP HANA and Umbra (HyPer's successor) extended it. What is less settled is the IR design question. Every production compiled database that preceded PhoebeDB generates LLVM IR directly from the query plan, accepting the loss of structure as a necessary cost. PhoebeDB's bet is that MLIR's multi-level approach makes that cost avoidable.

Architectural positioning

Dimension	PhoebeDB position
Compilation approach	HyPer-class native compilation
Procedural model	PostgreSQL-compatible (PL/pgSQL)
IR design	Multi-level (MLIR) — not flat LLVM IR
Execution engine	Unified HTAP
Optimization scope	SQL + procedural together
Runtime model	In-process, compiled

Closest existing systems

Very close — compiled HTAP with procedural compilation: HyPer / SAP HANA, SingleStore, Umbra, NoisePage.
Partially similar — either no procedural compiler or only expression-level JIT: PostgreSQL + LLVM JIT, DuckDB.
Feature-specific overlap — SQL Server Hekaton's native compiled procedures.

All of them generate LLVM IR directly. None use MLIR's structured multi-level approach for the full query compilation pipeline.

Conclusion

The interesting design choice in PhoebeDB isn't "use JIT" — every serious analytical engine does that. It's choosing MLIR over direct LLVM IR, and accepting the cost of building and maintaining three custom dialects in exchange for structured, verifiable, composable optimization passes at each abstraction level.

Plan shape lives in the high-level IR. Executor control flow lives in the mid-level IR. Runtime glue lives in the low-level IR. Each pass reasons about one thing. Adding a new operator is a local change; adding a new backend target is a lowering change. Bugs introduced at one level cannot silently corrupt another.

For a database compiler that needs to grow — to cover PL/pgSQL compilation, vectorized operators, new storage backends — that modularity is worth more than the simplicity of going straight to LLVM. That is the bet PhoebeDB is making, and it is a bet the rest of the compiled-database field has not yet taken.