Observability

DB9 includes built-in observability that is always on. Query performance is sampled automatically, slow queries are always captured, and the data is available through the CLI, REST API, and direct SQL — no extensions to install.

This page covers what you can observe, how to access it, and where the current boundaries are compared to standard PostgreSQL.

What You Can See

Metric	Available	Access
QPS and TPS	Yes	CLI, API, SQL
Latency (avg, p99)	Yes	CLI, API, SQL
Active connections (count)	Yes	CLI, API, SQL
Query samples with latency	Yes	CLI, API, SQL
Slow queries (p99-sorted)	Yes	CLI
Error count and failed queries	Yes	CLI, API, SQL
Write-conflict retries (TiKV)	Yes	SQL
HNSW index build metrics	Yes	SQL
EXPLAIN query plans	Yes	SQL
Schema, tables, indexes	Yes	CLI, SQL
Per-connection details	No	—
Index usage stats	No	—
Memory/cache stats	No	—
Prometheus/OpenTelemetry export	No	—

CLI: db9 db inspect

The primary observability tool is db9 db inspect:

# Summary dashboard (QPS, TPS, latency, connections, errors)
db9 db inspect <database>

# Query samples with latency breakdown
db9 db inspect <database> queries

# Combined summary + queries
db9 db inspect <database> report

# Top slow queries sorted by p99 latency
db9 db inspect <database> slow-queries

Schema introspection

# List schemas, tables, and indexes
db9 db inspect <database> schemas
db9 db inspect <database> tables
db9 db inspect <database> indexes

All commands support --json and --output csv for programmatic use.

Example output

db9 db inspect mydb

Summary (60-minute window)
  QPS:         41.7
  TPS:         20.8
  Latency avg: 12.5 ms
  Latency p99: 45.2 ms
  Connections:  8
  Statements:  150,000
  Commits:     75,000
  Errors:      3

SQL: System Functions

Two built-in table functions provide observability data directly in SQL:

Summary metrics

SELECT qps, tps, latency_avg_ms, latency_p99_ms, active_connections,
       statement_count, txn_commit_count, error_count
FROM _db9_sys_observability();

Returns a single row with the rolling 60-minute summary. Additional columns include write-conflict retry counts (retry_attempts, retry_budget_exhausted, retry_timeout_aborts) and HNSW index metrics (hnsw_graph_bytes_written, hnsw_serialize_duration_us).

Query samples

SELECT query, sample_count, error_count,
       latency_avg_ms, latency_p99_ms, latency_max_ms,
       last_seen_ms_ago
FROM _db9_sys_query_samples()
ORDER BY latency_p99_ms DESC
LIMIT 10;

Returns per-query aggregates for sampled queries in the current window. Up to 50 unique query groups are tracked.

Find failing queries

SELECT query, error_count, sample_count
FROM _db9_sys_query_samples()
WHERE error_count > 0
ORDER BY error_count DESC;

REST API

For automation, call the observability endpoint directly:

curl -s "https://api.db9.ai/customer/databases/<database-id>/observability" \
  -H "Authorization: Bearer $TOKEN" | jq .

Returns JSON with summary (QPS, TPS, latency, connections) and samples (per-query metrics).

EXPLAIN

DB9 supports EXPLAIN for query plan inspection:

EXPLAIN SELECT * FROM users WHERE email = 'test@example.com';
EXPLAIN (FORMAT JSON) SELECT * FROM users WHERE id = 1;

Plan nodes include SeqScan, IndexScan, HnswScan, NestedLoop, HashJoin, Sort, Limit, Aggregate, and more.

EXPLAIN ANALYZE is accepted but currently shows estimated costs only — per-operator runtime statistics are not yet available.

How Sampling Works

DB9 collects observability data using in-memory sampling:

Window: 60-minute rolling window
Default rate: 1 in 1,000 queries sampled (0.1%)
Always captured: errors and queries exceeding the slow threshold (200 ms default)
Max query groups: 50 unique query fingerprints
Max sample events: 20,000 in memory
SQL normalization: whitespace reduced, text truncated to 512 characters
Redaction: PASSWORD literals replaced with '***'

This is process-level, in-memory data. It is not persisted to TiKV and resets when the server restarts.

Database Status

For database metadata (not performance metrics), use:

db9 db status <database>

Shows database name, ID, state (ACTIVE, CLONING, CREATE_FAILED), region, creation time, endpoints, and connection string.

PostgreSQL Compatibility

DB9 implements some pg_catalog views as stubs for tool compatibility:

View	Status	Notes
`information_schema.tables`	Functional	Standard schema introspection
`pg_indexes`	Functional	Index definitions
`pg_stat_user_tables`	Stub (zeros)	All counters return 0
`pg_statistic_ext`	Stub (empty)	No extended statistics
`pg_stat_statements`	Not available	Use `_db9_sys_query_samples()`
`pg_stat_activity`	Not available	Use `_db9_sys_observability()` for connection count

Tools like Prisma, Drizzle, and psql that query information_schema and pg_indexes work normally. Tools that depend on pg_stat_statements or pg_stat_activity for monitoring need to use DB9’s native functions instead.

What Is Not Available Today

Per-connection activity — only aggregate connection count, no per-session details
Index usage statistics — pg_stat_user_tables.idx_scan returns 0
Memory and cache metrics — shared buffers, work memory usage not exposed
Slow query log to file — observability data is query-based only, not logged to disk
Prometheus/OpenTelemetry export — no /metrics endpoint or trace export
Full query text with parameters — only normalized SQL stored (bind values not captured)
EXPLAIN ANALYZE runtime stats — plan shows estimated costs, not actual per-operator timing

Limits and Quotas — all operational limits in one place
Security and Auth — token types and role model
Production Checklist — verify observability before going live
CLI Reference — db9 db inspect command reference
SQL Reference — SQL engine compatibility