Indexes
Indexes are B+ Tree data structures that allow AxiomDB to find rows matching a
condition without scanning the entire table. Every index is a Copy-on-Write B+ Tree
stored in the same .db file as the table data.
Current Storage Model
Today AxiomDB exposes two SQL-visible table layouts:
- tables without an explicit
PRIMARY KEYstill use the classic heap + index path - tables with an explicit
PRIMARY KEYnow bootstrap clustered storage atCREATE TABLEtime
That new clustered SQL boundary is now wider through 39.18:
- the table root is clustered from day one
- PRIMARY KEY catalog metadata points at that clustered root
INSERTon clustered tables now works through the clustered PK treeSELECTon clustered tables now works through the clustered PK tree and clustered secondary bookmarksUPDATEon clustered tables now rewrites rows directly in the clustered PK treeDELETEon clustered tables now applies clustered delete-mark through the clustered PK treeVACUUM table_nameon clustered tables now physically purges safe dead rows, frees overflow chains, and cleans dead secondary bookmarksALTER TABLE legacy_table REBUILDnow migrates legacy heap+PRIMARY KEY tables into clustered layout and rebuilds secondary indexes as PK-bookmark indexes
Internally, the storage rewrite already has clustered insert, point lookup,
range scan, same-leaf update, delete-mark, structural rebalance / relocate-update,
secondary PK bookmarks, and overflow-backed clustered rows for large payloads,
and explicit-PK CREATE TABLE now records that layout in SQL metadata.
Phase 39.14 made the first executor-visible clustered write cut, 39.15
opened the read side, 39.16 brought UPDATE onto that same clustered path,
and 39.17 now does the same for logical clustered DELETE: PK lookups/ranges,
clustered secondary bookmark probes, in-place delete-mark, and rollback-safe
WAL all stay on clustered storage. 39.18 closes the first clustered
maintenance slice too: VACUUM now purges physically dead clustered cells and
their overflow/secondary debris instead of leaving clustered cleanup as a
future-only promise.
That internal rewrite is still honest about its current boundary:
- relocate-update rewrites only the current inline version
- clustered delete is still delete-mark first, then later
VACUUM - large clustered rows can already spill to overflow pages internally, but SQL only explicit-PK tables expose clustered layout at DDL time
- clustered covering reads still degrade to fetching the clustered row body; a true clustered index-only optimization is still future work
- clustered child-table foreign-key enforcement still remains future work
WITHOUT ROWID and InnoDB both treat the clustered key as the row-storage identity. AxiomDB now does the same for SQL-visible clustered INSERT: no heap fallback row is created, and non-primary indexes store PK bookmarks instead of heap-era RecordId payloads.
Index Statistics and Query Planner
AxiomDB maintains per-column statistics to help the query planner choose between an index scan and a full table scan.
How it works
When you create an index, AxiomDB automatically computes:
row_countβ total visible rows in the tablendv(number of distinct values) β exact count of distinct non-NULL values
The planner uses selectivity = 1 / ndv for equality predicates. If
selectivity > 20% of rows would be returned, a full table scan is cheaper
than an index scan, so the planner uses the table scan.
ndv = 3, rows = 10,000 β selectivity = 33% > 20% β Scan
ndv = 100, rows = 10,000 β selectivity = 1% < 20% β Index
ANALYZE command
Run ANALYZE to refresh statistics after bulk inserts or deletes:
-- Analyze a specific table (all indexed columns)
ANALYZE TABLE users;
-- Analyze a specific column only
ANALYZE TABLE orders (status);
Statistics are automatically computed at CREATE INDEX time. Run ANALYZE when:
- Significant data was added after the index was created
- Query plans seem wrong (e.g., full scan when index would be faster)
Automatic staleness detection
After enough row changes (>20% of the analyzed row count), the planner
automatically uses conservative defaults (ndv = 200) until the next ANALYZE.
This prevents stale statistics from causing poor query plans.
Composite Indexes
A composite index covers two or more columns. The query planner uses it when the WHERE clause contains equality conditions on the leading columns.
CREATE INDEX idx_user_status ON orders(user_id, status);
-- Uses composite index: both leading columns matched
SELECT * FROM orders WHERE user_id = 42 AND status = 'active';
-- Also uses index via prefix scan: leading column only
SELECT * FROM orders WHERE user_id = 42;
-- Does NOT use index: leading column absent from WHERE
SELECT * FROM orders WHERE status = 'active';
Fill Factor
Fill factor controls how full a B-Tree leaf page is allowed to get before it splits. A lower fill factor leaves intentional free space on each page, reducing split frequency for workloads that add rows after index creation.
-- Append-heavy time-series table: pages fill to 70% before splitting.
CREATE INDEX idx_ts ON events(created_at) WITH (fillfactor = 70);
-- Compact read-only index: fill pages completely.
CREATE UNIQUE INDEX uq_email ON users(email) WITH (fillfactor = 100);
-- Default (90%) β equivalent to omitting WITH:
CREATE INDEX idx_x ON t(x);
Range and default
Valid range: 10β100. Default: 90 (matches PostgreSQLβs
BTREE_DEFAULT_FILLFACTOR). fillfactor = 100 reproduces the current behavior
exactly β pages fill completely before splitting.
Effect on splits
With fillfactor = F:
- Leaf page splits when it reaches
βF Γ ORDER_LEAF / 100βentries (instead of at full capacity). - After a split, both new pages hold roughly
F/2 %of capacity β leaving room for future inserts without triggering another split. - Internal pages always fill to capacity (not user-configurable).
Automatic Indexes
AxiomDB automatically creates a unique B+ Tree index for:
- Every
PRIMARY KEYdeclaration - Every
UNIQUEcolumn constraint orUNIQUEtable constraint
For clustered tables, the automatically created PRIMARY KEY metadata row reuses
the clustered table root instead of allocating a second heap-era PK tree.
UNIQUE secondary indexes still allocate ordinary B+ Tree roots, but 39.14
now maintains their entries as secondary_key ++ pk_suffix bookmarks during
SQL-visible clustered INSERT.
Multi-row INSERT on Indexed Tables
Multi-row INSERT ... VALUES (...), (... ) statements now stay on a grouped
heap/index path even when the target table already has a PRIMARY KEY or
secondary indexes.
INSERT INTO users VALUES
(1, 'a@example.com'),
(2, 'b@example.com'),
(3, 'c@example.com');
This matters because indexed tables used to fall back to per-row maintenance on this workload. The grouped path keeps the same SQL-visible behavior:
- duplicate
PRIMARY KEY/UNIQUEvalues inside the same statement still fail - a failed multi-row statement does not leak partial committed rows
- partial indexes still include only rows whose predicate matches
Startup Integrity Verification
When a database opens, AxiomDB verifies every catalog-visible index against the heap-visible rows reconstructed after WAL recovery.
- If the tree is readable but its contents diverge from the heap, AxiomDB rebuilds the index automatically from table contents before serving traffic.
- If the tree cannot be traversed safely, open fails with
IndexIntegrityFailureinstead of guessing.
This check applies to both embedded mode and server mode because both call the same startup verifier.
amcheck's βnever trust an unreadable B-Treeβ rule
with SQLite's REINDEX-style rebuild-from-table approach. Readable divergence is
healed automatically from heap data; unreadable trees still block open.
Creating Indexes Manually
CREATE [UNIQUE] INDEX index_name ON table_name (col1 [ASC|DESC], col2 ...);
CREATE INDEX idx_users_name ON users (name);
CREATE INDEX idx_orders_user ON orders (user_id, placed_at DESC);
CREATE UNIQUE INDEX uq_sku ON products (sku);
See DDL β CREATE INDEX for the full syntax.
When Indexes Help
The query planner considers an index when:
- The leading column(s) of the index appear in a
WHEREequality or range condition. - The index columns match the
ORDER BYdirection and order (avoids a sort step). - The index is selective enough that scanning it is cheaper than a full table scan.
-- Good: leading column (user_id) used in WHERE
CREATE INDEX idx_orders_user ON orders (user_id, placed_at DESC);
SELECT * FROM orders WHERE user_id = 42 ORDER BY placed_at DESC;
-- Bad: leading column not in WHERE β index not used
SELECT * FROM orders WHERE placed_at > '2026-01-01';
-- Solution: create a separate index on placed_at
CREATE INDEX idx_orders_date ON orders (placed_at);
Composite Index Column Order
The order of columns in a composite index determines which query patterns it
accelerates. The B+ Tree is sorted by the concatenated key (col1, col2, ...).
CREATE INDEX idx_orders_user_status ON orders (user_id, status);
This index accelerates:
WHERE user_id = 42WHERE user_id = 42 AND status = 'paid'
This index does NOT accelerate:
WHERE status = 'paid'(leading column not constrained)
Rule of thumb: put the highest-selectivity, most frequently filtered column first.
Partial Indexes
A partial index covers only the rows matching a WHERE predicate. This reduces index
size and maintenance cost.
-- Index only pending orders (the common access pattern)
CREATE INDEX idx_pending_orders ON orders (user_id)
WHERE status = 'pending';
-- Index only non-deleted users
CREATE INDEX idx_active_users ON users (email)
WHERE deleted_at IS NULL;
The query planner uses a partial index only when the queryβs WHERE clause implies the indexβs predicate.
Index Key Size Limit
The B+ Tree stores encoded keys up to 768 bytes. For most column types this is never an issue:
INT,BIGINT,UUID,TIMESTAMPβ fixed-size, always well under the limit.TEXT,VARCHARβ a 760-character value will just fit. If you index a column with very long strings (> 750 characters), rows exceeding the limit are silently skipped atCREATE INDEXtime and returnIndexKeyTooLongon INSERT.
Query Planner β Phase 6.3
The planner rewrites the execution plan before running the scan. Currently recognized patterns:
Equality lookup β exact match on the leading indexed column:
-- Uses B-Tree point lookup (O(log n) instead of O(n))
SELECT * FROM users WHERE email = 'alice@example.com';
SELECT * FROM orders WHERE id = 42;
This includes the PRIMARY KEY. A query like WHERE id = 42 does not need a
redundant secondary index on id.
Range scan β upper and lower bound on the leading indexed column:
-- Uses B-Tree range scan
SELECT * FROM orders WHERE created_at > '2024-01-01' AND created_at < '2025-01-01';
SELECT * FROM products WHERE price >= 10.0 AND price <= 50.0;
Full scan fallback β any pattern not recognized above:
-- Falls back to full table scan (no index for OR, function, or non-leading column)
SELECT * FROM users WHERE email LIKE '%gmail.com';
SELECT * FROM orders WHERE status = 'paid' OR total > 1000;
Partial Indexes
A partial index covers only the rows matching a WHERE predicate. This reduces index size, speeds up maintenance, and β for UNIQUE indexes β restricts uniqueness enforcement to the matching subset.
-- Only active users need unique emails.
CREATE UNIQUE INDEX uq_active_email ON users(email) WHERE deleted_at IS NULL;
-- Index only pending orders for fast user lookups.
CREATE INDEX idx_pending ON orders(user_id) WHERE status = 'pending';
Partial UNIQUE indexes
The uniqueness constraint applies only among rows satisfying the predicate. Rows that do not satisfy the predicate are never inserted into the index.
-- alice deleted, then re-created: no conflict.
INSERT INTO users VALUES (1, 'alice@x.com', '2025-01-01'); -- deleted
INSERT INTO users VALUES (2, 'alice@x.com', NULL); -- active β
INSERT INTO users VALUES (3, 'alice@x.com', NULL); -- β UniqueViolation
INSERT INTO users VALUES (4, 'alice@x.com', '2025-06-01'); -- deleted β
Planner support
The planner uses a partial index only when the queryβs WHERE clause implies the index predicate. If the implication cannot be verified, the planner falls back to a full scan or a full index β always correct.
-- Uses partial index (WHERE contains `deleted_at IS NULL`):
SELECT * FROM users WHERE email = 'alice@x.com' AND deleted_at IS NULL;
-- Falls back to full scan (predicate not in WHERE):
SELECT * FROM users WHERE email = 'alice@x.com';
WHERE deleted_at IS
NULL) is typically 10β100Γ smaller than a full unique index, since most
rows in high-churn tables are in the deleted state. This reduces build time,
per-INSERT maintenance cost, and bloom filter memory. MySQL InnoDB does not
support partial indexes, so this optimization is not available there.
Foreign Key Constraints
Foreign key constraints ensure referential integrity between tables. Every non-NULL value in the FK column of the child table must reference an existing row in the parent table.
-- Inline REFERENCES syntax
CREATE TABLE orders (
id INT PRIMARY KEY,
user_id INT REFERENCES users(id) ON DELETE CASCADE
);
-- Table-level FOREIGN KEY syntax
CREATE TABLE order_items (
id INT PRIMARY KEY,
order_id INT,
product_id INT,
CONSTRAINT fk_order FOREIGN KEY (order_id) REFERENCES orders(id) ON DELETE CASCADE,
CONSTRAINT fk_product FOREIGN KEY (product_id) REFERENCES products(id) ON DELETE RESTRICT
);
-- Add FK after the fact
ALTER TABLE orders
ADD CONSTRAINT fk_user FOREIGN KEY (user_id) REFERENCES users(id);
-- Remove a FK constraint
ALTER TABLE orders DROP CONSTRAINT fk_user;
ON DELETE Actions
| Action | Behavior |
|---|---|
RESTRICT / NO ACTION (default) | Error if child rows reference the deleted parent row |
CASCADE | Automatically delete all child rows (recursive, max depth 10) |
SET NULL | Set child FK column to NULL (column must be nullable) |
Enforcement Examples
CREATE TABLE users (id INT PRIMARY KEY, email TEXT);
CREATE TABLE orders (id INT PRIMARY KEY, user_id INT REFERENCES users(id) ON DELETE CASCADE);
INSERT INTO users VALUES (1, 'alice@x.com');
INSERT INTO orders VALUES (10, 1); -- β
user 1 exists
-- INSERT with missing parent β error
INSERT INTO orders VALUES (20, 999);
-- ERROR 23503: Foreign key constraint fails: 'orders.user_id' = '999'
-- DELETE parent with CASCADE β child rows automatically deleted
DELETE FROM users WHERE id = 1;
SELECT COUNT(*) FROM orders; -- β 0 (orders were cascaded)
-- DELETE parent with RESTRICT (default) β blocked if children exist
CREATE TABLE invoices (id INT PRIMARY KEY, order_id INT REFERENCES orders(id));
INSERT INTO users VALUES (2, 'bob@x.com');
INSERT INTO orders VALUES (30, 2);
INSERT INTO invoices VALUES (1, 30);
DELETE FROM orders WHERE id = 30;
-- ERROR 23503: foreign key constraint "fk_invoices_order_id": invoices.order_id references this row
NULL FK Values
A NULL value in a FK column is always allowed β it does not reference any parent row. This follows SQL standard MATCH SIMPLE semantics.
INSERT INTO orders VALUES (99, NULL); -- β
NULL user_id is always allowed
ON UPDATE
Only ON UPDATE RESTRICT (the default) is enforced. Updating a parent key while
child rows reference it is rejected. ON UPDATE CASCADE and ON UPDATE SET NULL
are planned for Phase 6.10.
Current Limitations
- Only single-column FKs are supported. Composite FKs β
FOREIGN KEY (a, b) REFERENCES t(x, y)β are planned for Phase 6.10. ON UPDATE CASCADE/ON UPDATE SET NULLare planned for Phase 6.10.- FK validation uses B-Tree range scans via the FK auto-index (Phase 6.9). Falls back to full table scan for pre-6.9 FKs.
Bloom Filter Optimization
AxiomDB maintains an in-memory Bloom filter for each secondary index. The filter allows the query executor to skip B-Tree page reads entirely when a lookup key is definitively absent from the index.
How It Works
When the planner chooses an index lookup for a WHERE col = value condition,
the executor checks the Bloom filter before touching the B-Tree:
- Filter says no β key is 100% absent. Zero B-Tree pages read. Empty result returned immediately.
- Filter says maybe β normal B-Tree lookup proceeds.
The filter is a probabilistic data structure: it never produces false negatives (a key that exists will always get a βmaybeβ), but can produce false positives (a key that does not exist may occasionally get a βmaybeβ instead of βnoβ). The false positive rate is tuned to 1% β at most 1 in 100 absent-key lookups will still read the B-Tree.
Lifecycle
| Event | Effect on Bloom filter |
|---|---|
CREATE INDEX | Filter created and populated with all existing keys |
INSERT | New key added to filter |
UPDATE | Old key marks filter dirty; new key added |
DELETE | Filter marked dirty (deleted keys cannot be removed from a standard Bloom filter) |
DROP INDEX | Filter removed from memory |
| Server restart | Filters start empty; might_exist returns true (conservative) until CREATE INDEX is run again |
Dirty Filters
After a DELETE or UPDATE, the filter is marked dirty: it may still
return βmaybeβ for keys that were deleted. This does not affect correctness β
the B-Tree lookup simply finds no matching row. It only means that some absent
keys may not benefit from the zero-I/O shortcut until the filter is rebuilt via
ANALYZE TABLE (available since Phase 6.12).
ANALYZE TABLE t periodically to rebuild the filter and restore
optimal miss performance.
Dropping an Index
-- MySQL syntax (required when the server is in MySQL wire protocol mode)
DROP INDEX index_name ON table_name;
DROP INDEX IF EXISTS idx_old ON table_name;
Dropping an index frees all B-Tree pages, reclaiming disk space immediately.
Dropping an index that backs a PRIMARY KEY or UNIQUE constraint requires dropping the
constraint first (via ALTER TABLE DROP CONSTRAINT).
Index Introspection
-- All indexes on a table
SELECT index_name, is_unique, is_primary, columns
FROM axiom_indexes
WHERE table_name = 'orders'
ORDER BY is_primary DESC, index_name;
-- Root page of each index (useful for storage analysis)
SELECT index_name, root_page_id
FROM axiom_indexes;
Index-Only Scans (Covering Indexes)
When every column referenced by a SELECT is already stored as a key column of
the chosen index, AxiomDB can satisfy the query entirely from the B-Tree β no
heap page read is needed. This is called an index-only scan.
Example
CREATE INDEX idx_age ON users (age);
-- Index-only scan: only column needed (age) is the index key.
SELECT age FROM users WHERE age = 25;
The executor reads the matching B-Tree leaf entries, extracts the age value
from the encoded key bytes, and returns the rows without ever touching the heap.
INCLUDE syntax β declaring covering intent
You can declare additional columns as part of a covering index using the
INCLUDE clause:
CREATE INDEX idx_name_dept ON employees (name) INCLUDE (department, salary);
INCLUDE columns are recorded in the catalog metadata so the planner knows
the index covers those columns. Note: physical storage of INCLUDE column
values in B-Tree leaf nodes is deferred to a future covering-index phase. Until then, the planner
uses INCLUDE to correctly identify IndexOnlyScan opportunities, but the
values are read from the key portion of the B-Tree entry.
MVCC and the 24-byte header read
Index-only scans still perform a lightweight visibility check per row. For each
B-Tree entry, the executor reads only the 24-byte RowHeader (the slot header
containing txn_id_created, txn_id_deleted, and sequence number) to determine
whether the row is visible to the current transaction snapshot. The full row
payload is never decoded.
Non-Unique Secondary Index Key Format
Non-unique secondary indexes store the indexed column values together with the
rowβs RecordId as the B-Tree key:
key = encode_index_key(col_vals) || encode_rid(rid) // 10-byte RecordId suffix
This ensures every B-Tree entry is globally unique even when multiple rows share
the same indexed value β making INSERT safe without a DuplicateKey error.
When looking up all rows with a given indexed value, the executor performs a range scan with synthetic bounds:
lo = encode_index_key(val) || [0x00; 10] // smallest possible RecordId
hi = encode_index_key(val) || [0xFF; 10] // largest possible RecordId
RecordId (page_id + slot_id + sequence number) instead of a
separate primary key column, keeping the suffix at a fixed 10 bytes regardless
of the table's key type.
Phase 39.9 adds a second, internal-only secondary-key path for the clustered
rewrite: there the physical entry is secondary_key ++ missing_primary_key_columns
so a future clustered executor can jump from a secondary entry to the owning
PRIMARY KEY row without depending on a heap slot. Phases 39.11 and 39.12
already add internal WAL/rollback and crash recovery for clustered rows by
primary key and exact row image, but that path is still not SQL-visible yet.
B+ Tree Implementation Details
AxiomDBβs B+ Tree is a Copy-on-Write structure backed by the StorageEngine trait.
Key properties:
- ORDER_INTERNAL = 223: up to 223 separator keys and 224 child pointers per internal node
- ORDER_LEAF = 217: up to 217 (key, RecordId) pairs per leaf node
- 16 KB pages: both internal and leaf nodes fit exactly in one page
- AtomicU64 root: root page swapped atomically β readers are lock-free
- CoW semantics: writes copy the path from root to the modified leaf; old versions are visible to concurrent readers until they finish
See B+ Tree Internals for the on-disk format and the derivation of the ORDER constants.