Kubernetes CSI Driver
zerofs-csi exposes a ZeroFS filesystem to Kubernetes as dynamically provisioned persistent volumes. One ZeroFS instance — the gateway — backs a StorageClass; each volume is a directory on that filesystem, and pods mount their directory over 9P.
The driver registers as csi.zerofs.net. It provisions mount volumes only (no block mode) and passes the CSI conformance suite (csi-sanity v5.4.0). Manifests live in zerofs/zerofs-csi/deploy/ and are applied unmodified by the project's k3s end-to-end CI.
How It Works
ZeroFS is single-writer per bucket/prefix, so the driver shares one gateway across all volumes of a StorageClass rather than running an instance per volume:
- A single-replica StatefulSet runs
zerofs runagainst one bucket, serving 9P to the nodes and a gRPC admin API to the controller. Conditional writes fence the single writer, so a rescheduled gateway cannot split-brain against a stale one. - The controller plugin translates
CreateVolumeinto aCreateDirectorycall for/volumes/pvc-<uuid>on the admin API, andDeleteVolumeintoRemoveDirectory. Removal renames the directory into a trash area and returns immediately; a background task deletes the contents and the garbage collector reclaims the chunks. - The node plugin publishes a volume by spawning
zerofs mount <gateway> <target> --aname /volumes/pvc-<uuid>— the userspace 9P client described in 9P File Access, attached directly to the volume's directory. The attach is rooted at that directory: the mount cannot name anything above it, and..at its root resolves to itself.
Each published volume is its own zerofs mount process, so mounts inherit the client's reconnect behavior: when the gateway pod restarts or is rescheduled, operations block until it returns, then resume on the same open file handles. A gateway restart does not invalidate existing mounts.
| Kubernetes access mode | Supported |
|---|---|
ReadWriteOnce | yes |
ReadOnlyMany | yes |
ReadWriteMany | yes |
ReadWriteOncePod | no |
ReadWriteMany works because any number of pods on any number of nodes each run their own client against the same directory; the gateway arbitrates, including POSIX byte-range locks.
Installation
Prerequisites: an object store bucket with credentials for the gateway, and /dev/fuse on the nodes (the node plugin runs privileged and mounts it from the host).
1. The gateway
The example manifest creates the zerofs namespace, a config Secret, a Service (9P on 5564, admin RPC on 7000), and the StatefulSet:
curl -fsSLO https://raw.githubusercontent.com/Barre/ZeroFS/main/zerofs/zerofs-csi/deploy/gateway-example.yaml
# Edit the Secret: storage url, encryption_password, credentials.
kubectl apply -f gateway-example.yaml
The gateway is configured like any other ZeroFS instance (see Configuration); sizing its cache is the main throughput lever for everything mounted through it.
2. The driver
DEPLOY=https://raw.githubusercontent.com/Barre/ZeroFS/main/zerofs/zerofs-csi/deploy
kubectl apply -f $DEPLOY/csidriver.yaml -f $DEPLOY/rbac.yaml \
-f $DEPLOY/controller.yaml -f $DEPLOY/node.yaml
The controller is a Deployment running the CSI controller service next to the external-provisioner sidecar. The node plugin is a DaemonSet on every node (it tolerates all taints, since any node can run a pod with a volume). The ghcr.io/barre/zerofs-csi image is published for amd64 and arm64 and carries the same zerofs binary as the standalone image.
3. The StorageClass
kubectl apply -f $DEPLOY/storageclass-example.yaml
| Parameter | Required | Meaning |
|---|---|---|
adminEndpoint | yes | Gateway admin RPC URL, e.g. http://zerofs-gateway.zerofs.svc:7000 |
gateway | yes | 9P address nodes mount from, e.g. zerofs-gateway.zerofs.svc:5564 |
volumesRoot | no | Directory holding the volume directories (default /volumes) |
DeleteVolume receives no StorageClass parameters — the CSI specification gives it only the volume id and secrets — so the StorageClass must also reference a Secret carrying adminEndpoint through csi.storage.k8s.io/provisioner-secret-name and -namespace. The example includes it.
One StorageClass per gateway. Several StorageClasses, each with its own gateway, bucket, and encryption password, can coexist in one cluster.
Using Volumes
Nothing ZeroFS-specific appears in workloads:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: data
spec:
storageClassName: zerofs
accessModes: ["ReadWriteMany"]
resources:
requests:
storage: 10Gi
Creating the claim provisions the directory; scheduling a pod that uses it mounts the directory at the pod's mountPath. There is no attach step (attachRequired: false), so pods start as soon as the node plugin has the mount up.
Requested capacity is recorded on the PV but not enforced — every volume shares the gateway filesystem. Cap total usage with the gateway's max_size_gb (Configuration).
Failure Behavior
- Gateway restart or rescheduling — published mounts block and then resume, including across open file handles; see the reconnect description in 9P File Access. Pods observe a stall, not errors. Conditional-write fencing prevents a stale gateway from writing after its replacement takes over.
- Node plugin restart — the FUSE clients are children of the node plugin container, so a restart breaks the published mounts on that node until the affected pods are recreated. The DaemonSet therefore uses an
OnDeleteupdate strategy: upgrades are deliberate, per-node, coordinated with draining. - Node loss — volume directories live on the gateway, so rescheduled pods mount the same data from any other node.
Security
Neither gateway port is authenticated; the trust boundary is network reachability, the same model as NFS with AUTH_SYS. Both must be restricted to their CSI clients:
- 9P (5564) is the data plane. The attach aname is namespacing, not authentication: any client that reaches this port can attach to any directory, including another volume's or the filesystem root. Pods never speak 9P themselves — they see only a kernel mountpoint — so this port is open to the node plugins.
- Admin RPC (7000) is a root-equivalent control plane. It runs every call as uid 0 and can trash any volume (
RemoveDirectory). Its only client is the controller, so it is locked to the controller pod alone — the tighter of the two rules.
networkpolicy-example.yaml ships in the deploy bundle and enforces both. Apply it alongside the gateway (adapt the labels to your manifests):
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: zerofs-gateway
namespace: zerofs
spec:
podSelector:
matchLabels:
app: zerofs-gateway
policyTypes: [Ingress]
ingress:
- ports: [{ port: 5564 }] # data plane, from every node plugin
from:
- podSelector:
matchLabels:
app: zerofs-csi-node
- ports: [{ port: 7000 }] # control plane, from the controller only
from:
- podSelector:
matchLabels:
app: zerofs-csi-controller
A NetworkPolicy is a no-op on a CNI that does not enforce them (vanilla flannel, default k3s); there, neither port has in-cluster protection, so do not expose the gateway to untrusted pods. Data is still encrypted before it reaches the object store; the encryption password is per gateway, i.e. per StorageClass.
Limitations
- Capacity is not enforced. PVC sizes are labels; the gateway's
max_size_gbcaps the filesystem as a whole. - No per-volume snapshots. Checkpoints cover the whole gateway filesystem, not one volume directory.
- StorageClass
mountOptionsare rejected. The mount takes no pass-through options; a capability carrying them fails withInvalidArgumentrather than being silently ignored. - Mount volumes only.
volumeMode: Blockis not supported and not planned. A workload that wants a fixed-size disk image can create one inside its filesystem volume —truncate -s 10G /data/disk.img— and consume it directly (e.g. a file-backed VM disk) or, given sufficient privilege, attach it as a loop device. A first-class block volume would only add a raw device in exchange for giving up the filesystem'sReadWriteManyand reconnect-tolerant recovery.