Skip to content
Discord Get Started

Recovery and Branch Lifecycle

DB9 databases move through a defined set of states from creation to deletion. Branches follow a similar lifecycle with additional clone-specific phases. This page explains the state machine, what happens when things fail, how automatic recovery works, and where the boundaries are for backup and disaster recovery.

Every database is in exactly one of these states:

StateMeaning
CREATINGKeyspace is being provisioned in TiKV and admin user is being bootstrapped
ACTIVEDatabase is ready for connections
CLONINGBranch clone is in progress (from a parent database)
DISABLINGDeletion is in progress; keyspace is being disabled
DISABLEDDeleted; keyspace has been removed from the cluster
CREATE_FAILEDProvisioning or clone failed; terminal state
Output
Creation: CREATING → ACTIVE
Branching: CLONING → ACTIVE
Deletion: ACTIVE → DISABLING → DISABLED
Failure: CREATING → CREATE_FAILED
CLONING → CREATE_FAILED

DISABLED and CREATE_FAILED are terminal states. A database in either state cannot be recovered or reactivated.

Terminal
db9 db status <database>

Returns the database name, ID, state, region, creation time, endpoints, and connection string. For branches, also shows the parent database ID.

Branches are independent databases created from a point-in-time snapshot of a parent. Once created, a branch has its own keyspace, credentials, and lifecycle — it does not share storage with the parent.

  1. Concurrency check — at most 2 branches can be created concurrently (across your account). Additional requests are rejected with HTTP 429.
  2. Snapshot capture — the current state of the parent database is captured as a timestamp.
  3. Keyspace creation — a new TiKV keyspace is provisioned for the branch.
  4. Data transfer — data is copied from the parent to the new keyspace using one of two methods:
    • TiKV restore: point-in-time snapshot restore at the storage level (used when the TiKV restore API is available)
    • Logical clone: SQL-level dump and restore (fallback)
  5. Finalization — admin credentials are set, extensions are verified, and the branch moves to ACTIVE.

During this process, the branch is in CLONING state.

While in CLONING, the branch progresses through internal phases visible in the API response:

PhaseDescription
PREPARINGSetting up source connection and metadata
KEYSPACE_READYTarget keyspace created (logical clone path)
RESTORE_SUBMITTINGSubmitting restore task to TiKV
RESTORE_RUNNINGTiKV restore in progress
FINALIZINGPost-restore cleanup and credential setup
VERIFYINGData integrity checks
SUCCEEDEDClone completed; database transitions to ACTIVE
FAILEDClone failed; database transitions to CREATE_FAILED

Branch creation is asynchronous. Poll the database status until the state changes from CLONING:

Terminal
# Check branch status
db9 db status <branch-id>
TypeScript
// SDK polling
const branch = await client.databases.branch(parentId, { name: 'feature-test' });
let status = await client.databases.get(branch.id);
while (status.state === 'CLONING') {
await new Promise(r => setTimeout(r, 2000));
status = await client.databases.get(branch.id);
}
// status.state is now ACTIVE or CREATE_FAILED
  • Branches store a reference to their parent database ID
  • The parent can be deleted while branches still exist — branches become orphaned but continue to function normally
  • Orphaned branches retain all their data and can be used and deleted independently
  • There is no automatic cascade delete; deleting a parent does not delete its branches

Deletion is permanent and cannot be undone.

Terminal
# Interactive confirmation
db9 delete <database>
# Skip confirmation (CI/automation)
db9 delete <database> --yes
  1. The database state changes to DISABLING
  2. The TiKV keyspace is disabled via the PD API
  3. The database state changes to DISABLED
  4. The database remains in the metadata store (for audit trail) but is no longer accessible

If the keyspace disable fails, the database rolls back to its previous state. You can retry the deletion.

StateCan delete?Notes
ACTIVEYesStandard deletion
CLONINGYesCancels the in-progress branch job, then deletes
CREATE_FAILEDYesCleans up the failed provisioning
DISABLINGNoAlready being deleted
DISABLEDNoAlready deleted

DB9 runs a background reconciler that detects and recovers databases stuck in intermediate states. This handles cases like process crashes during provisioning or network interruptions during branch creation.

Stuck stateRecovery action
CREATING for >10 minutesChecks if keyspace exists in TiKV; transitions to ACTIVE if it does, CREATE_FAILED if it does not
CLONING for >10 minutes (no active branch job)Disables the orphaned keyspace; transitions to CREATE_FAILED
DISABLING for >10 minutesForce-disables the keyspace; transitions to DISABLED

The reconciler runs every 5 minutes. It also cleans up orphaned branch keyspaces in TiKV that belong to databases in terminal states (CREATE_FAILED or DISABLED).

Branch clone jobs use a distributed lease system for coordination across multiple backend workers:

  • Each job holds a lease that must be periodically renewed
  • If a worker crashes, the lease expires and another worker picks up the job
  • Transient errors (network timeouts, temporary unavailability) are retried automatically
  • Permanent errors move the job to FAILED and the branch to CREATE_FAILED
  • Point-in-time branching — you can create a branch from any active database, capturing its current state. This is the closest equivalent to a snapshot or backup.
  • TiKV durability — data is stored in TiKV with replication across the cluster. Individual node failures do not cause data loss.
  • Automatic stuck-state recovery — the reconciler handles process-level failures during provisioning and deletion.
  • User-initiated backups — there is no backup API or CLI command. Branching is the only way to capture a point-in-time copy.
  • Point-in-time recovery (PITR) — you cannot restore a database to an arbitrary past timestamp. Branches capture the state at creation time only.
  • Cross-region replication — databases exist in a single region. There is no built-in geo-replication.
  • Backup export — there is no way to export a backup file (pg_dump equivalent) from DB9. Use db9 db dump for logical SQL export, but this is a live dump, not a consistent snapshot.
  • Retention policies — deleted databases cannot be recovered. There is no soft-delete window or trash/recycle bin.
  • WAL archiving — TiKV does not expose a WAL archive interface. Standard PostgreSQL backup tools (pg_basebackup, pgBackRest) are not compatible.
  1. Regular branching — create periodic branches as checkpoints. Each branch is a full, independent copy of the database at the time of creation.
  2. Logical export — run db9 db dump <database> to export SQL that can be used to recreate the schema and data in another database.
  3. Application-level backup — for critical data, write periodic exports to an external system (S3, another database) using the HTTP extension or application code.
LimitValue
Max concurrent branch creations2
Branch clone timeout (per step)5 minutes
Branch clone max runtime10 minutes
Reconciler cycle interval5 minutes
Stuck-state recovery threshold10 minutes
Audit log retention90 days

See Limits and Quotas for the complete list.