Recovery and Branch Lifecycle
DB9 databases move through a defined set of states from creation to deletion. Branches follow a similar lifecycle with additional clone-specific phases. This page explains the state machine, what happens when things fail, how automatic recovery works, and where the boundaries are for backup and disaster recovery.
Database States
Section titled “Database States”Every database is in exactly one of these states:
| State | Meaning |
|---|---|
CREATING | Keyspace is being provisioned in TiKV and admin user is being bootstrapped |
ACTIVE | Database is ready for connections |
CLONING | Branch clone is in progress (from a parent database) |
DISABLING | Deletion is in progress; keyspace is being disabled |
DISABLED | Deleted; keyspace has been removed from the cluster |
CREATE_FAILED | Provisioning or clone failed; terminal state |
Normal transitions
Section titled “Normal transitions”Creation: CREATING → ACTIVEBranching: CLONING → ACTIVEDeletion: ACTIVE → DISABLING → DISABLEDFailure: CREATING → CREATE_FAILED CLONING → CREATE_FAILEDDISABLED and CREATE_FAILED are terminal states. A database in either state cannot be recovered or reactivated.
Check database state
Section titled “Check database state”db9 db status <database>Returns the database name, ID, state, region, creation time, endpoints, and connection string. For branches, also shows the parent database ID.
Branch Lifecycle
Section titled “Branch Lifecycle”Branches are independent databases created from a point-in-time snapshot of a parent. Once created, a branch has its own keyspace, credentials, and lifecycle — it does not share storage with the parent.
How branch creation works
Section titled “How branch creation works”- Concurrency check — at most 2 branches can be created concurrently (across your account). Additional requests are rejected with HTTP 429.
- Snapshot capture — the current state of the parent database is captured as a timestamp.
- Keyspace creation — a new TiKV keyspace is provisioned for the branch.
- Data transfer — data is copied from the parent to the new keyspace using one of two methods:
- TiKV restore: point-in-time snapshot restore at the storage level (used when the TiKV restore API is available)
- Logical clone: SQL-level dump and restore (fallback)
- Finalization — admin credentials are set, extensions are verified, and the branch moves to
ACTIVE.
During this process, the branch is in CLONING state.
Branch phases
Section titled “Branch phases”While in CLONING, the branch progresses through internal phases visible in the API response:
| Phase | Description |
|---|---|
PREPARING | Setting up source connection and metadata |
KEYSPACE_READY | Target keyspace created (logical clone path) |
RESTORE_SUBMITTING | Submitting restore task to TiKV |
RESTORE_RUNNING | TiKV restore in progress |
FINALIZING | Post-restore cleanup and credential setup |
VERIFYING | Data integrity checks |
SUCCEEDED | Clone completed; database transitions to ACTIVE |
FAILED | Clone failed; database transitions to CREATE_FAILED |
Poll for completion
Section titled “Poll for completion”Branch creation is asynchronous. Poll the database status until the state changes from CLONING:
# Check branch statusdb9 db status <branch-id>// SDK pollingconst branch = await client.databases.branch(parentId, { name: 'feature-test' });let status = await client.databases.get(branch.id);while (status.state === 'CLONING') { await new Promise(r => setTimeout(r, 2000)); status = await client.databases.get(branch.id);}// status.state is now ACTIVE or CREATE_FAILEDParent-branch relationship
Section titled “Parent-branch relationship”- Branches store a reference to their parent database ID
- The parent can be deleted while branches still exist — branches become orphaned but continue to function normally
- Orphaned branches retain all their data and can be used and deleted independently
- There is no automatic cascade delete; deleting a parent does not delete its branches
Database Deletion
Section titled “Database Deletion”Deletion is permanent and cannot be undone.
# Interactive confirmationdb9 delete <database>
# Skip confirmation (CI/automation)db9 delete <database> --yesWhat happens during deletion
Section titled “What happens during deletion”- The database state changes to
DISABLING - The TiKV keyspace is disabled via the PD API
- The database state changes to
DISABLED - The database remains in the metadata store (for audit trail) but is no longer accessible
If the keyspace disable fails, the database rolls back to its previous state. You can retry the deletion.
Deleting databases in non-terminal states
Section titled “Deleting databases in non-terminal states”| State | Can delete? | Notes |
|---|---|---|
ACTIVE | Yes | Standard deletion |
CLONING | Yes | Cancels the in-progress branch job, then deletes |
CREATE_FAILED | Yes | Cleans up the failed provisioning |
DISABLING | No | Already being deleted |
DISABLED | No | Already deleted |
Automatic Recovery
Section titled “Automatic Recovery”DB9 runs a background reconciler that detects and recovers databases stuck in intermediate states. This handles cases like process crashes during provisioning or network interruptions during branch creation.
What the reconciler does
Section titled “What the reconciler does”| Stuck state | Recovery action |
|---|---|
CREATING for >10 minutes | Checks if keyspace exists in TiKV; transitions to ACTIVE if it does, CREATE_FAILED if it does not |
CLONING for >10 minutes (no active branch job) | Disables the orphaned keyspace; transitions to CREATE_FAILED |
DISABLING for >10 minutes | Force-disables the keyspace; transitions to DISABLED |
The reconciler runs every 5 minutes. It also cleans up orphaned branch keyspaces in TiKV that belong to databases in terminal states (CREATE_FAILED or DISABLED).
Branch job recovery
Section titled “Branch job recovery”Branch clone jobs use a distributed lease system for coordination across multiple backend workers:
- Each job holds a lease that must be periodically renewed
- If a worker crashes, the lease expires and another worker picks up the job
- Transient errors (network timeouts, temporary unavailability) are retried automatically
- Permanent errors move the job to
FAILEDand the branch toCREATE_FAILED
Backup and Disaster Recovery
Section titled “Backup and Disaster Recovery”What DB9 provides
Section titled “What DB9 provides”- Point-in-time branching — you can create a branch from any active database, capturing its current state. This is the closest equivalent to a snapshot or backup.
- TiKV durability — data is stored in TiKV with replication across the cluster. Individual node failures do not cause data loss.
- Automatic stuck-state recovery — the reconciler handles process-level failures during provisioning and deletion.
What DB9 does not provide
Section titled “What DB9 does not provide”- User-initiated backups — there is no backup API or CLI command. Branching is the only way to capture a point-in-time copy.
- Point-in-time recovery (PITR) — you cannot restore a database to an arbitrary past timestamp. Branches capture the state at creation time only.
- Cross-region replication — databases exist in a single region. There is no built-in geo-replication.
- Backup export — there is no way to export a backup file (pg_dump equivalent) from DB9. Use
db9 db dumpfor logical SQL export, but this is a live dump, not a consistent snapshot. - Retention policies — deleted databases cannot be recovered. There is no soft-delete window or trash/recycle bin.
- WAL archiving — TiKV does not expose a WAL archive interface. Standard PostgreSQL backup tools (pg_basebackup, pgBackRest) are not compatible.
Practical recovery strategies
Section titled “Practical recovery strategies”- Regular branching — create periodic branches as checkpoints. Each branch is a full, independent copy of the database at the time of creation.
- Logical export — run
db9 db dump <database>to export SQL that can be used to recreate the schema and data in another database. - Application-level backup — for critical data, write periodic exports to an external system (S3, another database) using the HTTP extension or application code.
Lifecycle Limits
Section titled “Lifecycle Limits”| Limit | Value |
|---|---|
| Max concurrent branch creations | 2 |
| Branch clone timeout (per step) | 5 minutes |
| Branch clone max runtime | 10 minutes |
| Reconciler cycle interval | 5 minutes |
| Stuck-state recovery threshold | 10 minutes |
| Audit log retention | 90 days |
See Limits and Quotas for the complete list.
Next Pages
Section titled “Next Pages”- Provisioning — database creation and fleet management
- Branching Workflows — practical branch patterns for CI, preview, and isolation
- Limits and Quotas — all operational limits
- Observability — monitoring database health and performance
- Production Checklist — verify recovery strategy before going live