Semantic Analyzer
The semantic analyzer is the stage between parsing and execution. The parser produces
an AST where every column reference has col_idx = 0 as a placeholder. The analyzer:
- Validates all table and column names against the catalog.
- Resolves each
col_idxto the correct position in the combined row produced by the FROM and JOIN clauses. - Reports structured errors for unknown tables, unknown columns, and ambiguous unqualified column names.
- Applies the current database + schema defaults before unqualified table resolution.
The public compatibility entry point is:
#![allow(unused)]
fn main() {
analyze(stmt, storage, snapshot) -> Result<Stmt, DbError>
}
Internally, the multi-database-aware entry point is:
#![allow(unused)]
fn main() {
analyze_with_defaults(stmt, storage, snapshot, default_database, default_schema)
}
The compatibility wrapper currently uses ("axiomdb", "public").
BindContext — Resolution State
BindContext is built from the FROM and JOIN clauses of a SELECT before any column
reference is resolved.
#![allow(unused)]
fn main() {
struct BindContext {
tables: Vec<BoundTable>,
}
struct BoundTable {
alias: Option<String>, // FROM users AS u → alias = Some("u")
name: String, // real table name in the catalog
columns: Vec<ColumnDef>, // columns in declaration order (from CatalogReader)
col_offset: usize, // start position in the combined row
}
}
Building the BindContext
Each table in the FROM clause is added in left-to-right order. The col_offset
of each table is the sum of the column counts of all tables added before it.
FROM users u JOIN orders o ON u.id = o.user_id
Table 1: users (4 columns: id, name, age, email) → col_offset = 0
Table 2: orders (4 columns: id, user_id, total, status) → col_offset = 4
Combined row layout:
col 0 u.id
col 1 u.name
col 2 u.age
col 3 u.email
col 4 o.id
col 5 o.user_id
col 6 o.total
col 7 o.status
Database-Scoped Resolution
Table lookup is now keyed by:
(database, schema, table)
The analyzer threads default_database into every catalog lookup and recursive
subquery analysis. For session-driven execution, that default comes from
SessionContext::effective_database().
Legacy compatibility rule:
if a table has no explicit database binding:
it belongs to axiomdb
So identical SQL text can resolve differently depending on the selected database:
USE analytics;
SELECT * FROM users;
USE axiomdb;
SELECT * FROM users;
DATABASE() to be NULL before any explicit
selection, but AxiomDB still had to keep legacy unqualified table names working.
The analyzer therefore resolves against an effective database with fallback
axiomdb, while the session separately tracks whether the user explicitly
selected a database.
Column Resolution Algorithm
Given a column reference (qualifier, name) from the AST:
Qualified Reference (u.email)
- Find the
BoundTablewhose alias or name matchesqualifier.- If no table matches:
DbError::TableNotFound { name: qualifier }.
- If no table matches:
- Within that table’s
columns, find the column whose name matchesname.- If not found:
DbError::ColumnNotFound { table: qualifier, column: name }.
- If not found:
- Return
col_offset + column_position_within_table.
u.email → users.col_offset (0) + position of "email" in users (3) = 3
o.total → orders.col_offset (4) + position of "total" in orders (2) = 6
Unqualified Reference (name only)
- Search all tables in
BindContextfor a column namedname. - Collect all matches across all tables.
- If 0 matches:
DbError::ColumnNotFound. - If 1 match: return the resolved
col_idx. - If 2+ matches:
DbError::AmbiguousColumn { column: name, candidates: [...] }.
-- Unambiguous: only users has 'name'
SELECT name FROM users JOIN orders ON ...
-- Ambiguous: both users and orders have 'id'
SELECT id FROM users JOIN orders ON ...
-- ERROR 42702: column reference "id" is ambiguous
-- (appears in: users.id, orders.id)
-- Fix: qualify the reference
SELECT users.id FROM users JOIN orders ON ...
Subqueries in FROM
Subqueries in the FROM clause (derived tables) are analyzed recursively:
SELECT outer.total
FROM (
SELECT user_id, SUM(total) AS total
FROM orders
WHERE status = 'paid'
GROUP BY user_id
) AS outer
WHERE outer.total > 1000
The inner SELECT is analyzed first, producing a virtual BoundTable whose columns
are the output columns of the subquery (user_id, total). The outer BindContext
then treats this virtual table exactly like a real catalog table.
What the Analyzer Validates per Statement Type
SELECT
- FROM clause: every table reference exists in the catalog (or is a valid subquery).
- JOIN conditions: every column in
ON exprresolves correctly against the BindContext. - SELECT list: every column reference resolves; computed expressions type-check.
- WHERE clause: every column reference resolves.
- GROUP BY: every expression resolves.
- HAVING: every column reference resolves (must be either in GROUP BY or aggregate).
- ORDER BY: every expression resolves.
INSERT
- Target table exists in the catalog.
- Each named column in the column list exists in the table.
- If
INSERT ... SELECT, the inner SELECT is analyzed. - Column count in VALUES must match the column list (or all non-DEFAULT columns if no column list is given).
UPDATE
- Target table exists in the catalog.
- Every column in SET assignments exists in the table.
- WHERE clause column references resolve against the target table.
DELETE
- Target table exists in the catalog.
- WHERE clause column references resolve against the target table.
CREATE TABLE
- No table with the same name exists (unless
IF NOT EXISTS). - Each
REFERENCES table(col)in a foreign key references a table that exists and a column that exists in that table and is a primary key or unique column. - CHECK expressions are parsed and type-checked (must evaluate to boolean).
DROP TABLE
- Target table exists (unless
IF EXISTS). - No other table has a foreign key pointing to the target (unless
CASCADE).
CREATE INDEX
- Target table exists in the catalog.
- Every indexed column exists in the table.
- No index with the same name already exists (unless
IF NOT EXISTS).
CREATE DATABASE / DROP DATABASE / USE / SHOW DATABASES
These statements are mostly pass-through at the analyzer layer:
CREATE DATABASEandDROP DATABASEcarry names but no column bindingsUSEis validated against the database catalog at execution/wire timeSHOW DATABASESproduces a computed rowset and needs no name resolution
Error Types
| Error | SQLSTATE | When it occurs |
|---|---|---|
TableNotFound | 42P01 | FROM, JOIN, or REFERENCES points to unknown table |
ColumnNotFound | 42703 | Column name not in any in-scope table |
AmbiguousColumn | 42702 | Unqualified column matches in multiple tables |
DuplicateTable | 42P07 | CREATE TABLE for an existing table |
TypeMismatch | 42804 | Expression type incompatible with column type |
Snapshot Isolation in the Analyzer
The analyzer calls CatalogReader::list_tables and CatalogReader::list_columns
with the caller’s TransactionSnapshot. This means the analyzer sees the schema as
it appeared at the start of the current transaction, not the latest committed schema.
This ensures that:
- A concurrent DDL (
CREATE TABLE) that commits after the current transaction began is invisible to the current transaction’s analyzer. - Schema changes within the same transaction are visible to subsequent statements in that same transaction.
Post-Analysis AST
After analysis, every Expr::Column in the AST has its col_idx set to the correct
position in the combined row. The executor uses col_idx to index directly into the
row array — no name lookup occurs at execution time.
#![allow(unused)]
fn main() {
// Before analysis (from parser):
Expr::Column { name: "total".to_string(), table: Some("o".to_string()), col_idx: 0 }
// After analysis (from analyzer):
Expr::Column { name: "total".to_string(), table: Some("o".to_string()), col_idx: 6 }
// col_idx = orders.col_offset (4) + position of "total" in orders (2)
}
This separation of concerns means the executor is a pure interpreter over the analyzed AST — it never touches the catalog and never performs name resolution. All validation errors are caught before any I/O begins.