Skip to content

Building a catalog

Functions are half of what a worker serves. The other half is the catalog: the schemas, tables, views, and macros a SQL user expects to find. None of these extend a base class. You build small descriptor objects and register them on the Worker builder, and the engine surfaces them like any local catalog object.

Here is a whole catalog — a schema with a table, a view, and two macros:

java
// VGI-Java example: constructing a catalog (schema · table · view · macros).
//
// Functions are one half of what a worker can serve; the other is the catalog —
// the schemas, tables, views, and macros SQL users expect. None of these extend
// a base class. You build small descriptor objects and register them on the
// Worker builder, and DuckDB surfaces them like any local catalog object.
//
// This builds a `catalog` schema with:
//   - a table `first_five`, whose rows come from the `numbers` table function
//   - a view `evens` over that table
//   - a scalar macro `triple` and a table macro `below`
//
//   ATTACH 'demo' AS demo (TYPE vgi, LOCATION 'launch:/abs/path/bin/runCatalog');
//   SELECT * FROM demo.catalog.first_five;     -- 0,1,2,3,4
//   SELECT * FROM demo.catalog.evens;          -- 0,2,4
package farm.query.vgi.examples;

import farm.query.vgi.Worker;
import farm.query.vgi.catalog.CatalogTable;
import farm.query.vgi.catalog.Macro;
import farm.query.vgi.catalog.MacroType;
import farm.query.vgi.catalog.View;
import farm.query.vgi.internal.SchemaUtil;
import farm.query.vgi.types.Schemas;
import org.apache.arrow.vector.types.pojo.Schema;

import java.util.List;
import java.util.Map;

public final class CatalogExample {

    // A table's columns are an Arrow schema, serialized to IPC bytes.
    private static final byte[] FIRST_FIVE_COLUMNS = SchemaUtil.serializeSchema(
            new Schema(List.of(Schemas.nullable("n", Schemas.INT64))));

    /**
     * Register the catalog objects onto an existing worker. The worker must also
     * register the {@code numbers} table function ({@link TableExample}) that
     * backs the table.
     */
    public static Worker register(Worker w) {
        return w
                .schemaComment("catalog", "Tables, views, and macros built by hand")

                // A table whose rows are produced by a function. `scanFunction`
                // names the backing function and its arguments — here `numbers(5)`,
                // which the worker itself serves. Its one INT64 output column maps
                // (by position) onto the table's declared `n` column.
                .registerCatalogTable(CatalogTable.builder("catalog", "first_five", FIRST_FIVE_COLUMNS)
                        .comment("The integers 0..4")
                        .scanFunction("numbers", List.of((Object) 5L), Map.of())
                        .cardinality(5, 5)
                        .build())

                // A view is SQL the engine expands at query time.
                .registerView(new View("catalog", "evens",
                        "SELECT n FROM catalog.first_five WHERE n % 2 = 0",
                        "Even values from first_five"))

                // Macros are parameterized SQL the engine inlines at the call site.
                .registerMacro(new Macro("catalog", "triple", MacroType.SCALAR,
                        List.of("x"), "x * 3", "Triple a value"))
                .registerMacro(new Macro("catalog", "series", MacroType.TABLE,
                        List.of("k"), "SELECT n FROM range(k) t(n)",
                        "The integers 0..k-1"));
    }

    public static void main(String[] args) {
        Worker w = Worker.builder()
                .catalogName("demo")
                .registerTable(new TableExample());   // the `numbers` function the table is backed by
        register(w);
        w.runFromArgs(args);
    }
}
sql
SELECT * FROM demo.catalog.first_five;     -- 0,1,2,3,4
SELECT * FROM demo.catalog.evens;          -- 0,2,4
SELECT demo.catalog.triple(4);             -- 12
SELECT * FROM demo.catalog.series(3);      -- 0,1,2

Schemas

You don't create a schema explicitly. Registering a table, view, or macro into a schema name brings it into being. schemaComment is how you attach a comment (and implicitly declare the name up front):

java
w.schemaComment("catalog", "Tables, views, and macros built by hand");

The default schema is main; change it with Worker.defaultSchema(...).

Tables

A table needs two things: its columns and something to produce its rows.

Columns are an Arrow schema serialized to IPC bytes:

java
byte[] columns = SchemaUtil.serializeSchema(
        new Schema(List.of(Schemas.nullable("n", Schemas.INT64))));

Rows come from a scan function — a table function the engine calls when it scans the table. It can be one of your own registered table functions or a built-in. The example backs first_five with the worker's own numbers function, bound to the argument 5:

java
CatalogTable.builder("catalog", "first_five", columns)
        .comment("The integers 0..4")
        .scanFunction("numbers", List.of((Object) 5L), Map.of())
        .cardinality(5, 5)
        .build()

The scan function's output columns map onto the table's declared columns by position, so the function's n column becomes the table's n. Give the optimizer a cardinality when you know it, and richer hints when you have them:

java
CatalogTable.builder("catalog", "products", columns)
        .scanFunction("read_products")
        .statistics(List.of(
                ColumnStatistics.ofInt64("id", 1, 100, false, 100L),
                ColumnStatistics.ofUtf8("name", "Anvil", "Zebra", false, 100L, false, 30L)))
        .primaryKey(List.of(List.of(0)))   // column 0 is the PK
        .build()

See the rff_* and products tables in vgi-example-worker for statistics, constraints, and foreign keys in full.

Views

A view is SQL the engine expands at query time. It's bound to its own catalog and schema, so its body can reference sibling tables by their plain schema-qualified name:

java
new View("catalog", "evens",
        "SELECT n FROM catalog.first_five WHERE n % 2 = 0",
        "Even values from first_five")

Macros

A macro is parameterized SQL the engine inlines at the call site. Scalar macros return a value; table macros return a relation.

java
new Macro("catalog", "triple", MacroType.SCALAR,
        List.of("x"), "x * 3", "Triple a value")

new Macro("catalog", "series", MacroType.TABLE,
        List.of("k"), "SELECT n FROM range(k) t(n)", "The integers 0..k-1")

Macro bodies can't reference your catalog's tables

Unlike a view, a macro is pure text expanded in the caller's context, not the macro's. A body like SELECT … FROM catalog.first_five fails, because the engine resolves catalog.first_five against whatever catalog the caller is in — not your worker's. And you can't fully qualify it (demo.catalog.…), because you don't know the name the user will ATTACH under. So keep macro bodies self-contained (built-ins, or the macro's own parameters), as series does, and use a view when you need to wrap one of your own tables.

More than one catalog

A single worker can serve several catalogs, each attaching under its own name. Register the extras with registerExtraCatalog; functions whose names start with the catalog's prefix belong to it:

java
w.registerExtraCatalog(new Worker.ExtraCatalog(
        "reports",          // ATTACH 'reports' AS …
        "1.0.0",            // implementation version
        "1.0.0",            // data version
        "Reporting catalog",
        "reports_"))        // owns functions named reports_*
 .registerTable(new ReportsScanFunction());   // e.g. reports_scan

It all shows up in the catalog views

Everything a worker exposes lands in the engine's catalog views, so existing tools and queries find it without changes:

sql
SELECT table_name FROM information_schema.tables WHERE table_catalog = 'demo';
-- evens, first_five

The whole catalog in this page is exercised end-to-end in examples/test/examples.test; the fuller surface (statistics, constraints, multi-branch scans, versioned catalogs) lives in the vgi-example-worker module.