Building a catalog
Functions are half of what a worker serves. The other half is the catalog: the schemas, tables, views, and macros a SQL user expects to find. None of these extend a base class. You build small descriptor objects and register them on the Worker builder, and the engine surfaces them like any local catalog object.
Here is a whole catalog — a schema with a table, a view, and two macros:
// VGI-Java example: constructing a catalog (schema · table · view · macros).
//
// Functions are one half of what a worker can serve; the other is the catalog —
// the schemas, tables, views, and macros SQL users expect. None of these extend
// a base class. You build small descriptor objects and register them on the
// Worker builder, and DuckDB surfaces them like any local catalog object.
//
// This builds a `catalog` schema with:
// - a table `first_five`, whose rows come from the `numbers` table function
// - a view `evens` over that table
// - a scalar macro `triple` and a table macro `below`
//
// ATTACH 'demo' AS demo (TYPE vgi, LOCATION 'launch:/abs/path/bin/runCatalog');
// SELECT * FROM demo.catalog.first_five; -- 0,1,2,3,4
// SELECT * FROM demo.catalog.evens; -- 0,2,4
package farm.query.vgi.examples;
import farm.query.vgi.Worker;
import farm.query.vgi.catalog.CatalogTable;
import farm.query.vgi.catalog.Macro;
import farm.query.vgi.catalog.MacroType;
import farm.query.vgi.catalog.View;
import farm.query.vgi.internal.SchemaUtil;
import farm.query.vgi.types.Schemas;
import org.apache.arrow.vector.types.pojo.Schema;
import java.util.List;
import java.util.Map;
public final class CatalogExample {
// A table's columns are an Arrow schema, serialized to IPC bytes.
private static final byte[] FIRST_FIVE_COLUMNS = SchemaUtil.serializeSchema(
new Schema(List.of(Schemas.nullable("n", Schemas.INT64))));
/**
* Register the catalog objects onto an existing worker. The worker must also
* register the {@code numbers} table function ({@link TableExample}) that
* backs the table.
*/
public static Worker register(Worker w) {
return w
.schemaComment("catalog", "Tables, views, and macros built by hand")
// A table whose rows are produced by a function. `scanFunction`
// names the backing function and its arguments — here `numbers(5)`,
// which the worker itself serves. Its one INT64 output column maps
// (by position) onto the table's declared `n` column.
.registerCatalogTable(CatalogTable.builder("catalog", "first_five", FIRST_FIVE_COLUMNS)
.comment("The integers 0..4")
.scanFunction("numbers", List.of((Object) 5L), Map.of())
.cardinality(5, 5)
.build())
// A view is SQL the engine expands at query time.
.registerView(new View("catalog", "evens",
"SELECT n FROM catalog.first_five WHERE n % 2 = 0",
"Even values from first_five"))
// Macros are parameterized SQL the engine inlines at the call site.
.registerMacro(new Macro("catalog", "triple", MacroType.SCALAR,
List.of("x"), "x * 3", "Triple a value"))
.registerMacro(new Macro("catalog", "series", MacroType.TABLE,
List.of("k"), "SELECT n FROM range(k) t(n)",
"The integers 0..k-1"));
}
public static void main(String[] args) {
Worker w = Worker.builder()
.catalogName("demo")
.registerTable(new TableExample()); // the `numbers` function the table is backed by
register(w);
w.runFromArgs(args);
}
}SELECT * FROM demo.catalog.first_five; -- 0,1,2,3,4
SELECT * FROM demo.catalog.evens; -- 0,2,4
SELECT demo.catalog.triple(4); -- 12
SELECT * FROM demo.catalog.series(3); -- 0,1,2Schemas
You don't create a schema explicitly. Registering a table, view, or macro into a schema name brings it into being. schemaComment is how you attach a comment (and implicitly declare the name up front):
w.schemaComment("catalog", "Tables, views, and macros built by hand");The default schema is main; change it with Worker.defaultSchema(...).
Tables
A table needs two things: its columns and something to produce its rows.
Columns are an Arrow schema serialized to IPC bytes:
byte[] columns = SchemaUtil.serializeSchema(
new Schema(List.of(Schemas.nullable("n", Schemas.INT64))));Rows come from a scan function — a table function the engine calls when it scans the table. It can be one of your own registered table functions or a built-in. The example backs first_five with the worker's own numbers function, bound to the argument 5:
CatalogTable.builder("catalog", "first_five", columns)
.comment("The integers 0..4")
.scanFunction("numbers", List.of((Object) 5L), Map.of())
.cardinality(5, 5)
.build()The scan function's output columns map onto the table's declared columns by position, so the function's n column becomes the table's n. Give the optimizer a cardinality when you know it, and richer hints when you have them:
CatalogTable.builder("catalog", "products", columns)
.scanFunction("read_products")
.statistics(List.of(
ColumnStatistics.ofInt64("id", 1, 100, false, 100L),
ColumnStatistics.ofUtf8("name", "Anvil", "Zebra", false, 100L, false, 30L)))
.primaryKey(List.of(List.of(0))) // column 0 is the PK
.build()See the rff_* and products tables in vgi-example-worker for statistics, constraints, and foreign keys in full.
Views
A view is SQL the engine expands at query time. It's bound to its own catalog and schema, so its body can reference sibling tables by their plain schema-qualified name:
new View("catalog", "evens",
"SELECT n FROM catalog.first_five WHERE n % 2 = 0",
"Even values from first_five")Macros
A macro is parameterized SQL the engine inlines at the call site. Scalar macros return a value; table macros return a relation.
new Macro("catalog", "triple", MacroType.SCALAR,
List.of("x"), "x * 3", "Triple a value")
new Macro("catalog", "series", MacroType.TABLE,
List.of("k"), "SELECT n FROM range(k) t(n)", "The integers 0..k-1")Macro bodies can't reference your catalog's tables
Unlike a view, a macro is pure text expanded in the caller's context, not the macro's. A body like SELECT … FROM catalog.first_five fails, because the engine resolves catalog.first_five against whatever catalog the caller is in — not your worker's. And you can't fully qualify it (demo.catalog.…), because you don't know the name the user will ATTACH under. So keep macro bodies self-contained (built-ins, or the macro's own parameters), as series does, and use a view when you need to wrap one of your own tables.
More than one catalog
A single worker can serve several catalogs, each attaching under its own name. Register the extras with registerExtraCatalog; functions whose names start with the catalog's prefix belong to it:
w.registerExtraCatalog(new Worker.ExtraCatalog(
"reports", // ATTACH 'reports' AS …
"1.0.0", // implementation version
"1.0.0", // data version
"Reporting catalog",
"reports_")) // owns functions named reports_*
.registerTable(new ReportsScanFunction()); // e.g. reports_scanIt all shows up in the catalog views
Everything a worker exposes lands in the engine's catalog views, so existing tools and queries find it without changes:
SELECT table_name FROM information_schema.tables WHERE table_catalog = 'demo';
-- evens, first_fiveThe whole catalog in this page is exercised end-to-end in examples/test/examples.test; the fuller surface (statistics, constraints, multi-branch scans, versioned catalogs) lives in the vgi-example-worker module.
