Advanced Use Cases

This guide covers two setups built on ZeroFS block devices, ZFS pools mirrored across regions, and tiered storage that puts local flash in front of an S3-backed pool, then composes them into a PostgreSQL reference architecture.

These setups span multiple machines and failure domains. Test them before relying on them. NBD and NFS carry no authentication; keep the bind addresses below on a private network.

Geo-Distributed Storage

Run one ZeroFS instance per region and mirror their NBD devices in a single ZFS pool:

# zerofs-us-east.toml, runs on 10.0.1.5
[cache]
dir = "/var/cache/zerofs"
disk_size_gb = 50.0

[storage]
url = "s3://zerofs-us-east/db"
encryption_password = "shared-key"

[aws]
region = "us-east-1"

[servers.nfs]
addresses = ["10.0.1.5:2049"]

[servers.nbd]
addresses = ["10.0.1.5:10809"]

# Start: zerofs run -c zerofs-us-east.toml

Benefits of Geo-Distribution

Redundancy: ZFS writes every block to all three mirror members, so each region holds a full copy.
Failure handling: The pool stays online when one member is unreachable, and ZFS resilvers the device when it returns.

Tiered Storage Architecture

Put local devices in front of an S3-backed pool. An L2ARC device caches recently read blocks; a SLOG device absorbs synchronous writes before they flush to S3:

# Create S3-backed main pool
zpool create datapool /dev/nbd0 /dev/nbd1 /dev/nbd2

# Add local NVMe as L2ARC cache
zpool add datapool cache /dev/nvme0n1

# Add local SSD as SLOG for synchronous writes
zpool add datapool log /dev/ssd0

# Monitor cache effectiveness
zpool iostat -v datapool 5

PostgreSQL

The two setups above compose into a storage layer for PostgreSQL: each database node keeps its data directory on a ZFS pool mirrored across regions, with local flash in front.

Reference Topology

Clients connect through a proxy (HAProxy or PgBouncer) to a primary/standby pair using PostgreSQL synchronous replication. Each node owns its own pool: a 3-way ZFS mirror of NBD devices, each device served by a ZeroFS instance backed by a different S3 region: six instances and six regions for the pair. The two layers cover different failures: synchronous replication covers node failure, and the proxy redirects connections when the primary fails; the mirror covers region and object-store failure, staying online with one member unreachable and resilvering the device when it returns.

Build one node's pool from the three instances configured above, then add local devices as in the tiered setup:

Per-Node Pool and Data Directory

# Devices created and connected as in "Geo-Distributed Storage" above
nbd-client 10.0.1.5 10809 /dev/nbd0 -N storage -persist -timeout 600 -connections 8
nbd-client 10.0.2.5 10809 /dev/nbd1 -N storage -persist -timeout 600 -connections 8
nbd-client 10.0.3.5 10809 /dev/nbd2 -N storage -persist -timeout 600 -connections 8

zpool create pgpool mirror /dev/nbd0 /dev/nbd1 /dev/nbd2
zpool set autotrim=on pgpool

# Local flash, as in "Tiered Storage Architecture" above
zpool add pgpool cache /dev/nvme0n1   # L2ARC: caches reads
zpool add pgpool log /dev/ssd0        # SLOG: absorbs synchronous writes

zfs create -o mountpoint=/var/lib/postgresql pgpool/pgdata
sudo -u postgres initdb -D /var/lib/postgresql/16/main

The standby repeats this with three ZeroFS instances in three other regions. Device creation and nbd-client flags are covered in NBD Block Devices.

Durability Path

Commit flow: PostgreSQL fsyncs its write-ahead log on commit. ZFS turns that into a synchronous write absorbed by the SLOG and issues flush commands to the pool's devices.
Flush semantics: ZeroFS advertises NBD flush and FUA support, and both a flush command and a FUA-flagged write return only after buffered data is durably persisted, the same flush path that serves fsync over 9P. See Durability & Consistency.
Flush latency: A flush seals the open in-memory segment and uploads it as one immutable segment object, then flushes metadata. Extents are compressed and encrypted into frames as they are written, so sealing only appends the segment directory and footer. It also waits for any in-flight background segment uploads. Flush latency therefore follows the main object store's PUT latency. The SLOG above keeps that off the commit path: commits wait on the local log device, and the pool flushes the ZeroFS devices at transaction-group sync.
Storage-level read replicas: A ZeroFS read replica exposes the same filesystem read-only from another instance. It does not replace a PostgreSQL standby, which needs a writable data directory and receives changes through PostgreSQL replication.

Back to documentation home