Kubernetes CSI Driver

zerofs-csi exposes a ZeroFS filesystem to Kubernetes as dynamically provisioned persistent volumes. One ZeroFS instance — the gateway — backs a StorageClass; each volume is a directory on that filesystem, and pods mount their directory over 9P.

How It Works

ZeroFS is single-writer per bucket/prefix, so the driver shares one gateway across all volumes of a StorageClass rather than running an instance per volume:

  • A single-replica StatefulSet runs zerofs run against one bucket, serving 9P to the nodes and a gRPC admin API to the controller. Conditional writes fence the single writer, so a rescheduled gateway cannot split-brain against a stale one.
  • The controller plugin translates CreateVolume into a CreateDirectory call for /volumes/pvc-<uuid> on the admin API, and DeleteVolume into RemoveDirectory. Removal renames the directory into a trash area and returns immediately; a background task deletes the contents and the garbage collector reclaims the chunks.
  • The node plugin publishes a volume by spawning zerofs mount <gateway> <target> --aname /volumes/pvc-<uuid> — the userspace 9P client described in 9P File Access, attached directly to the volume's directory. The attach is rooted at that directory: the mount cannot name anything above it, and .. at its root resolves to itself.

Each published volume is its own zerofs mount process, so mounts inherit the client's reconnect behavior: when the gateway pod restarts or is rescheduled, operations block until it returns, then resume on the same open file handles. A gateway restart does not invalidate existing mounts.

Kubernetes access modeSupported
ReadWriteOnceyes
ReadOnlyManyyes
ReadWriteManyyes
ReadWriteOncePodno

ReadWriteMany works because any number of pods on any number of nodes each run their own client against the same directory; the gateway arbitrates, including POSIX byte-range locks.

Installation

Prerequisites: an object store bucket with credentials for the gateway, and /dev/fuse on the nodes (the node plugin runs privileged and mounts it from the host).

1. The gateway

The example manifest creates the zerofs namespace, a config Secret, a Service (9P on 5564, admin RPC on 7000), and the StatefulSet:

curl -fsSLO https://raw.githubusercontent.com/Barre/ZeroFS/main/zerofs/zerofs-csi/deploy/gateway-example.yaml
# Edit the Secret: storage url, encryption_password, credentials.
kubectl apply -f gateway-example.yaml

The gateway is configured like any other ZeroFS instance (see Configuration); sizing its cache is the main throughput lever for everything mounted through it.

2. The driver

DEPLOY=https://raw.githubusercontent.com/Barre/ZeroFS/main/zerofs/zerofs-csi/deploy
kubectl apply -f $DEPLOY/csidriver.yaml -f $DEPLOY/rbac.yaml \
              -f $DEPLOY/controller.yaml -f $DEPLOY/node.yaml

The controller is a Deployment running the CSI controller service next to the external-provisioner sidecar. The node plugin is a DaemonSet on every node (it tolerates all taints, since any node can run a pod with a volume). The ghcr.io/barre/zerofs-csi image is published for amd64 and arm64 and carries the same zerofs binary as the standalone image.

3. The StorageClass

kubectl apply -f $DEPLOY/storageclass-example.yaml
ParameterRequiredMeaning
adminEndpointyesGateway admin RPC URL, e.g. http://zerofs-gateway.zerofs.svc:7000
gatewayyes9P address nodes mount from, e.g. zerofs-gateway.zerofs.svc:5564
volumesRootnoDirectory holding the volume directories (default /volumes)

DeleteVolume receives no StorageClass parameters — the CSI specification gives it only the volume id and secrets — so the StorageClass must also reference a Secret carrying adminEndpoint through csi.storage.k8s.io/provisioner-secret-name and -namespace. The example includes it.

One StorageClass per gateway. Several StorageClasses, each with its own gateway, bucket, and encryption password, can coexist in one cluster.

Using Volumes

Nothing ZeroFS-specific appears in workloads:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: data
spec:
  storageClassName: zerofs
  accessModes: ["ReadWriteMany"]
  resources:
    requests:
      storage: 10Gi

Creating the claim provisions the directory; scheduling a pod that uses it mounts the directory at the pod's mountPath. There is no attach step (attachRequired: false), so pods start as soon as the node plugin has the mount up.

Failure Behavior

  • Gateway restart or rescheduling — published mounts block and then resume, including across open file handles; see the reconnect description in 9P File Access. Pods observe a stall, not errors. Conditional-write fencing prevents a stale gateway from writing after its replacement takes over.
  • Node plugin restart — the FUSE clients are children of the node plugin container, so a restart breaks the published mounts on that node until the affected pods are recreated. The DaemonSet therefore uses an OnDelete update strategy: upgrades are deliberate, per-node, coordinated with draining.
  • Node loss — volume directories live on the gateway, so rescheduled pods mount the same data from any other node.

Security

Neither gateway port is authenticated; the trust boundary is network reachability, the same model as NFS with AUTH_SYS. Both must be restricted to their CSI clients:

  • 9P (5564) is the data plane. The attach aname is namespacing, not authentication: any client that reaches this port can attach to any directory, including another volume's or the filesystem root. Pods never speak 9P themselves — they see only a kernel mountpoint — so this port is open to the node plugins.
  • Admin RPC (7000) is a root-equivalent control plane. It runs every call as uid 0 and can trash any volume (RemoveDirectory). Its only client is the controller, so it is locked to the controller pod alone — the tighter of the two rules.

networkpolicy-example.yaml ships in the deploy bundle and enforces both. Apply it alongside the gateway (adapt the labels to your manifests):

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: zerofs-gateway
  namespace: zerofs
spec:
  podSelector:
    matchLabels:
      app: zerofs-gateway
  policyTypes: [Ingress]
  ingress:
    - ports: [{ port: 5564 }] # data plane, from every node plugin
      from:
        - podSelector:
            matchLabels:
              app: zerofs-csi-node
    - ports: [{ port: 7000 }] # control plane, from the controller only
      from:
        - podSelector:
            matchLabels:
              app: zerofs-csi-controller

A NetworkPolicy is a no-op on a CNI that does not enforce them (vanilla flannel, default k3s); there, neither port has in-cluster protection, so do not expose the gateway to untrusted pods. Data is still encrypted before it reaches the object store; the encryption password is per gateway, i.e. per StorageClass.

Limitations

  • Capacity is not enforced. PVC sizes are labels; the gateway's max_size_gb caps the filesystem as a whole.
  • No per-volume snapshots. Checkpoints cover the whole gateway filesystem, not one volume directory.
  • StorageClass mountOptions are rejected. The mount takes no pass-through options; a capability carrying them fails with InvalidArgument rather than being silently ignored.
  • Mount volumes only. volumeMode: Block is not supported and not planned. A workload that wants a fixed-size disk image can create one inside its filesystem volume — truncate -s 10G /data/disk.img — and consume it directly (e.g. a file-backed VM disk) or, given sufficient privilege, attach it as a loop device. A first-class block volume would only add a raw device in exchange for giving up the filesystem's ReadWriteMany and reconnect-tolerant recovery.

Next Steps

Was this page helpful?