Skip to main content

When is a write durable?

An append is acknowledged only after a majority of nodes have persisted it to local disk (fsync). In a 3-node cluster, that means at least 2 nodes. A single node failure cannot lose an acknowledged write. No write is ever acknowledged based on a single copy.

Consistency model

Tonbo Stream provides linearizable writes. All appends are serialized through a single leader node, which assigns a total order. If an append has been acknowledged, any subsequent read on any node will reflect that write. Reads can be served by any node. Follower reads may lag slightly behind the leader (typically sub-millisecond), but they will never see data out of order or observe a write that was later rolled back.

Storage tiers

Data lives in two tiers:
  • Hot tier — each node’s local disk. Recent writes are served directly from here. This is the fast path for both writes and reads.
  • Cold tier — object storage (S3). A background process flushes older data to cold storage. Snapshot blobs are content-addressed by SHA-256.
The hot tier provides low-latency access; the cold tier provides long-term durability and allows nodes to recover without replaying the full history.

Availability and durability

In a 3-node cluster deployed across AWS availability zones:
MetricEstimate
Write availabilityTolerates 1 AZ failure. Based on AWS EC2 SLA (99.99% per instance), a 3-node majority quorum provides approximately 99.9999% write availability.
Read availabilityAny surviving node can serve reads, so read availability equals the probability that at least 1 node is up — effectively 99.9999%+.
Durability (at ack)Acknowledged writes exist on at least 2 local disks across independent availability zones.
Durability (after cold flush)Data reaches S3, providing 99.999999999% (11 nines) durability.

What happens when nodes fail

ScenarioImpact
1 node down (of 3)No impact. Reads and writes continue. The failed node catches up when it rejoins.
2 nodes down (of 3)Writes stop (no majority). Reads from the surviving node may still succeed for already-replicated data.
All nodes downService unavailable. No data is lost — each node’s local storage and S3 cold tier retain all acknowledged writes.
Leader failure triggers automatic re-election (typically within seconds). Clients writing to a follower have their writes proxied to the current leader transparently.