NBD Block Devices

ZeroFS runs a Network Block Device (NBD) server that exposes S3 storage as raw block devices. The devices hold ext4 filesystems, ZFS pools, databases, or VM boot disks. TRIM/discard is supported.


NBD Features

  • Raw Block Access - S3 storage appears as standard block devices (/dev/nbd*)
  • Dynamic Device Management - Create and delete devices through the filesystem; new devices are picked up at runtime without a server restart
  • Multiple Devices - One NBD server exposes every device in the .nbd/ directory
  • Unix Socket Support - Connect over TCP or a Unix socket
  • TRIM Support - Discard deletes the corresponding chunks from the LSM tree; compaction then reclaims the space in S3
  • Shared Caches - NBD reads and writes go through the same memory and disk caches as NFS and 9P
  • Any Filesystem - Format with ext4, XFS, ZFS, or any other filesystem

Configuration

Configure NBD support in your ZeroFS configuration file:

# TCP mode (default port 10809)
[servers.nbd]
addresses = ["127.0.0.1:10809"]

# Unix socket mode (better performance for local access)
[servers.nbd]
unix_socket = "/tmp/zerofs-nbd.sock"

# Both TCP and Unix socket
[servers.nbd]
addresses = ["127.0.0.1:10809"]
unix_socket = "/tmp/zerofs-nbd.sock"

Then start ZeroFS:

zerofs run --config zerofs.toml

Configuration options:

  • addresses - Array of bind addresses for NBD TCP server (default: ["127.0.0.1:10809"])
  • unix_socket - Unix socket path for NBD (optional)

Device Management

NBD devices are managed as regular files in the .nbd/ directory. First, mount ZeroFS via NFS or 9P:

# Mount via NFS
mount -t nfs 127.0.0.1:/ /mnt/zerofs

# Or mount via 9P
mount -t 9p -o trans=tcp,port=5564 127.0.0.1 /mnt/zerofs

# Create devices dynamically
mkdir -p /mnt/zerofs/.nbd
truncate -s 1G /mnt/zerofs/.nbd/database
truncate -s 5G /mnt/zerofs/.nbd/storage
truncate -s 10G /mnt/zerofs/.nbd/backup

# List devices
ls -lh /mnt/zerofs/.nbd/

Connecting to NBD Devices

Connect to devices using nbd-client with the device name:

# Connect via TCP (recommended settings for optimal performance)
nbd-client 127.0.0.1 10809 /dev/nbd0 -N database -persist -timeout 600 -connections 4
nbd-client 127.0.0.1 10809 /dev/nbd1 -N storage -persist -timeout 600 -connections 4
nbd-client 127.0.0.1 10809 /dev/nbd2 -N backup -persist -timeout 600 -connections 4

# Or connect via Unix socket (better performance for local access)
nbd-client -unix /tmp/zerofs-nbd.sock /dev/nbd0 -N database -persist -timeout 600 -connections 4
nbd-client -unix /tmp/zerofs-nbd.sock /dev/nbd1 -N storage -persist -timeout 600 -connections 4

# Verify devices are connected
nbd-client -check /dev/nbd0
lsblk | grep nbd

Important Parameters

  • -N <name> - Device name from .nbd/ directory (required)
  • -unix <path> - Use Unix socket instead of TCP
  • -persist - Automatically reconnect if connection drops (recommended)
  • -timeout 600 - Use high timeout for S3 latency (recommended: 600 seconds)
  • -connections 4 - Number of parallel connections (recommended: 4-8)
  • -readonly - Mount device as read-only
  • -block-size <size> - Block size (512, 1024, 2048, or 4096)

Durability Semantics

The handshake advertises NBD_FLAG_SEND_FLUSH, NBD_FLAG_SEND_FUA, and NBD_FLAG_CAN_MULTI_CONN. Command behavior:

  • FLUSH - Flushes the entire filesystem database: memtable to object storage, or a WAL append when the WAL is enabled. The reply is sent only after the flush completes. Concurrent FLUSH and FUA requests coalesce into a single database flush.
  • FUA - A WRITE, TRIM, or WRITE_ZEROES with the FUA flag set blocks until the data is durable, through the same flush path as FLUSH.
  • CACHE - Accepted as a no-op.
  • Structured replies - Not negotiated; clients fall back to simple replies.

See Durability for what durable means under each configuration.

Multiple Connections

Every connection to the NBD server shares one filesystem instance and one flush path, so a FLUSH on any connection covers writes completed on every connection. This satisfies the NBD_FLAG_CAN_MULTI_CONN contract: -connections 4-8 is safe for ZFS pools and databases that rely on write barriers.

Write Barriers

Write barriers hold under the default configuration. With sync_writes = false (the default), individual writes are buffered, but FLUSH and FUA still force durability at the barrier. ZFS transaction syncs and database WAL fsyncs issue exactly these commands.

Using Block Devices

Creating Filesystems

# Format with ext4
mkfs.ext4 /dev/nbd0
mount /dev/nbd0 /mnt/block

# Format with XFS
mkfs.xfs /dev/nbd1
mount /dev/nbd1 /mnt/xfs

ZFS on S3

Create ZFS pools backed by S3 storage:

# Create a ZFS pool
zpool create mypool /dev/nbd0 /dev/nbd1 /dev/nbd2

# Create datasets
zfs create mypool/data
zfs create mypool/backups

# Enable compression
zfs set compression=lz4 mypool

TRIM/Discard Support

ZeroFS NBD devices accept TRIM from any filesystem or zpool:

# Manual TRIM
fstrim /mnt/block

# Enable automatic discard for filesystems
mount -o discard /dev/nbd0 /mnt/block

# ZFS automatic TRIM
zpool set autotrim=on mypool
zpool trim mypool

When blocks are trimmed:

  1. ZeroFS deletes the corresponding chunks from the LSM tree
  2. Compaction reclaims the space in S3
  3. Freed blocks come off the bill

See Garbage Collection for the full reclamation pipeline.

WRITE_ZEROES vs TRIM

WRITE_ZEROES writes physical zeros in 1 MiB chunks. It overwrites chunks and never deletes them, so blkdiscard --zeroout does not free object-store space. TRIM is the only command that deletes chunks and lets compaction reclaim the space. To free space, use plain blkdiscard, fstrim, or zpool trim / autotrim.

Managing Device Files

NBD devices are regular files in the .nbd directory:

# View NBD device files
ls -lh /mnt/zerofs/.nbd/
# -rw-r--r--  1 root root 1.0G  database
# -rw-r--r--  1 root root 5.0G  storage
# -rw-r--r--  1 root root 10G   backup

# Add a new device (picked up at runtime, no restart)
truncate -s 20G /mnt/zerofs/.nbd/new-device

# Remove a device (disconnect NBD client first)
nbd-client -d /dev/nbd3
rm /mnt/zerofs/.nbd/old-device

Important: Device sizes cannot be changed after creation. To resize:

# Disconnect the NBD client
nbd-client -d /dev/nbd0
# Delete and recreate with new size
rm /mnt/zerofs/.nbd/database
truncate -s 2G /mnt/zerofs/.nbd/database
# Reconnect NBD client with optimal settings
nbd-client 127.0.0.1 10809 /dev/nbd0 -N database -persist -timeout 600 -connections 4

Advanced Use Cases

Geo-Distributed ZFS

Mirror a ZFS pool across regions:

# Machine 1 - US East (10.0.1.5)
# zerofs-us-east.toml
[storage]
url = "s3://my-bucket/us-east-db"
encryption_password = "shared-key"

[aws]
region = "us-east-1"

[servers.nbd]
addresses = ["0.0.0.0:10809"]

# Start: zerofs run -c zerofs-us-east.toml
# Machine 2 - EU West (10.0.2.5)
# zerofs-eu-west.toml
[storage]
url = "s3://my-bucket/eu-west-db"
encryption_password = "shared-key"

[aws]
region = "eu-west-1"

[servers.nbd]
addresses = ["0.0.0.0:10809"]

# Start: zerofs run -c zerofs-eu-west.toml

# Create devices on both machines
mount -t nfs 10.0.1.5:/ /mnt/zerofs
truncate -s 100G /mnt/zerofs/.nbd/storage
umount /mnt/zerofs

mount -t nfs 10.0.2.5:/ /mnt/zerofs
truncate -s 100G /mnt/zerofs/.nbd/storage
umount /mnt/zerofs

# From a client machine, connect to both regions with optimal settings
nbd-client 10.0.1.5 10809 /dev/nbd0 -N storage -persist -timeout 600 -connections 8
nbd-client 10.0.2.5 10809 /dev/nbd1 -N storage -persist -timeout 600 -connections 8

# Create mirrored pool across continents
zpool create global-pool mirror /dev/nbd0 /dev/nbd1

ZFS mirrors every block across both regions. The pool stays available if either region fails.

ZFS L2ARC Tiering

Use local NVMe as cache for S3-backed storage:

# Create S3-backed pool
zpool create mypool /dev/nbd0 /dev/nbd1

# Add local NVMe as L2ARC cache
zpool add mypool cache /dev/nvme0n1

# Monitor cache performance
zpool iostat -v mypool 1

Storage tiers:

  1. NVMe L2ARC - Hot data on local flash
  2. ZeroFS Caches - Recently used blocks in the configured memory and disk caches
  3. S3 Storage - Cold data in the bucket

Database Storage

Run databases on NBD devices:

# Connect the database device created in .nbd/
nbd-client 127.0.0.1 10809 /dev/nbd0 \
  -N database \
  -persist \
  -timeout 600 \
  -connections 4 \
  -block-size 4096

mkfs.ext4 /dev/nbd0
mount /dev/nbd0 /var/lib/postgresql

# Initialize database
sudo -u postgres initdb -D /var/lib/postgresql/16/main

Virtual Machine Storage

Boot VMs from NBD devices:

# Create VM disk
qemu-img create -f raw /dev/nbd0 50G

# Boot VM using NBD device
qemu-system-x86_64 \
  -drive file=/dev/nbd0,format=raw,cache=writeback \
  -m 4G -enable-kvm

Performance Considerations

Network Optimization

# Multiple parallel connections
nbd-client 127.0.0.1 10809 /dev/nbd0 \
  -N database \
  -persist \
  -timeout 600 \
  -connections 4 \
  -block-size 4096

# For high-latency connections (e.g., a server in another region)
nbd-client 10.0.2.5 10809 /dev/nbd0 \
  -N storage \
  -persist \
  -timeout 600 \
  -connections 8

Monitoring

Device Status

# Check if device is connected
nbd-client -check /dev/nbd0

# List all NBD exports from server
nbd-client -list 127.0.0.1

# View device statistics
cat /sys/block/nbd0/stat

# Monitor I/O performance
iostat -x 1 /dev/nbd0

# Disconnect device safely
nbd-client -disconnect /dev/nbd0

ZFS Monitoring

# Pool status
zpool status

# I/O statistics
zpool iostat -v 1

Troubleshooting

Connection Issues

# If connection fails, check:
# 1. ZeroFS is running and NBD ports are configured
ps aux | grep zerofs

# 2. NBD module is loaded
sudo modprobe nbd

# 3. Export name matches a file in the .nbd/ directory
nbd-client -list 127.0.0.1

# 4. Try with explicit parameters
nbd-client 127.0.0.1 10809 /dev/nbd0 \
  -N database \
  -nofork  # Stay in foreground for debugging

Performance Issues

# Use multiple connections for better throughput
nbd-client 127.0.0.1 10809 /dev/nbd0 \
  -N database \
  -connections 8 \
  -persist \
  -timeout 600

# For large sequential workloads, increase block size
nbd-client 127.0.0.1 10809 /dev/nbd0 \
  -N database \
  -block-size 4096 \
  -persist

Persistent Mount Configuration

# Add to /etc/rc.local or systemd service
cat > /etc/systemd/system/zerofs-nbd.service << EOF
[Unit]
Description=ZeroFS NBD Client
After=network.target

[Service]
Type=forking
ExecStart=/usr/sbin/nbd-client 127.0.0.1 10809 /dev/nbd0 -N database -persist -timeout 600 -block-size 4096 -connections 4
ExecStop=/usr/sbin/nbd-client -d /dev/nbd0
Restart=on-failure

[Install]
WantedBy=multi-user.target
EOF

systemctl enable zerofs-nbd
systemctl start zerofs-nbd

Next Steps

Was this page helpful?