Longhorn: Distributed Storage for Kubernetes - Complete Guide
Storage is one of the most complex problems in Kubernetes. Containers are ephemeral, but data must persist. Longhorn, a CNCF graduated project, solves this problem by offering distributed, resilient, and easy-to-manage storage. In this guide, we'll see how to install and configure Longhorn for your Kubernetes cluster.
The Storage Problem in Kubernetes
Ephemeral Storage
By default, Pod storage is ephemeral:
apiVersion: v1
kind: Pod
metadata:
name: my-app
spec:
containers:
- name: app
image: my-app:latest
# Everything in /data is lost on restart!
Traditional Solutions
| Solution | Pros | Cons |
|---|---|---|
| hostPath | Simple | No portability, no HA |
| NFS | Shared | Single point of failure |
| Cloud Provider | Managed | Vendor lock-in, costs |
| Ceph | Powerful | High complexity |
| Longhorn | Simple + Distributed | Requires resources |
What is Longhorn
Longhorn is a distributed block storage system for Kubernetes that:
- Uses local node disks
- Replicates data across multiple nodes
- Manages snapshots and backups
- Provides integrated web UI
- Supports DR and migration
Architecture
Main Components
| Component | Function |
|---|---|
| Longhorn Manager | Orchestration, API, UI |
| Longhorn Engine | iSCSI target, manages replicas |
| Replica | Data copy on local disk |
| CSI Driver | Kubernetes integration |
Requirements
Hardware
| Resource | Minimum | Recommended |
|---|---|---|
| Nodes | 3 | 3+ |
| CPU per node | 2 cores | 4+ cores |
| RAM per node | 4 GB | 8+ GB |
| Disk | SSD 50 GB | SSD/NVMe 200+ GB |
Software
Kubernetes: 1.25+
OS: Ubuntu 20.04+, RHEL 8+, SLES 15+
Filesystem: ext4, XFS
Node Prerequisites
# Each node must have open-iscsi
# Ubuntu/Debian
sudo apt install open-iscsi
sudo systemctl enable iscsid
sudo systemctl start iscsid
# RHEL/CentOS
sudo yum install iscsi-initiator-utils
sudo systemctl enable iscsid
sudo systemctl start iscsid
Installation
Method 1: Helm (Recommended)
# Add repo
helm repo add longhorn https://charts.longhorn.io
helm repo update
# Install
helm install longhorn longhorn/longhorn \
--namespace longhorn-system \
--create-namespace \
--set defaultSettings.defaultDataPath="/var/lib/longhorn" \
--set defaultSettings.defaultReplicaCount=3
Method 2: Kubectl
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.6.0/deploy/longhorn.yaml
Verify Installation
# Check pods
kubectl -n longhorn-system get pods
# Expected output
NAME READY STATUS RESTARTS
longhorn-manager-xxxxx 1/1 Running 0
longhorn-driver-deployer-xxxxx 1/1 Running 0
longhorn-ui-xxxxx 1/1 Running 0
engine-image-ei-xxxxx 1/1 Running 0
instance-manager-xxxxx 1/1 Running 0
UI Access
# Port forward
kubectl -n longhorn-system port-forward svc/longhorn-frontend 8080:80
# Access http://localhost:8080
StorageClass Configuration
Default StorageClass
Longhorn automatically creates a StorageClass:
kubectl get storageclass
# NAME PROVISIONER RECLAIMPOLICY
# longhorn (default) driver.longhorn.io Delete
Custom StorageClass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: longhorn-ssd
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Retain
volumeBindingMode: Immediate
parameters:
numberOfReplicas: "3"
staleReplicaTimeout: "2880"
fromBackup: ""
fsType: "ext4"
dataLocality: "best-effort"
Important Parameters
| Parameter | Description | Default |
|---|---|---|
numberOfReplicas | Number of replicas | 3 |
dataLocality | Data locality (disabled, best-effort, strict-local) | disabled |
diskSelector | Select specific disks | - |
nodeSelector | Select specific nodes | - |
Creating and Using Volumes
PersistentVolumeClaim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-data
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 10Gi
Using the Volume in a Pod
apiVersion: v1
kind: Pod
metadata:
name: my-app
spec:
containers:
- name: app
image: my-app:latest
volumeMounts:
- name: data
mountPath: /data
volumes:
- name: data
persistentVolumeClaim:
claimName: my-data
StatefulSet with Longhorn
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
spec:
serviceName: postgres
replicas: 1
selector:
matchLabels:
app: postgres
template:
metadata:
labels:
app: postgres
spec:
containers:
- name: postgres
image: postgres:15
env:
- name: POSTGRES_PASSWORD
value: "password"
- name: PGDATA
value: /var/lib/postgresql/data/pgdata
volumeMounts:
- name: data
mountPath: /var/lib/postgresql/data
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: longhorn
resources:
requests:
storage: 20Gi
Backup and Disaster Recovery
Configure Backup Target
Longhorn supports backup to S3 or NFS.
S3 Backup:
# Secret for S3 credentials
apiVersion: v1
kind: Secret
metadata:
name: s3-secret
namespace: longhorn-system
type: Opaque
stringData:
AWS_ACCESS_KEY_ID: "your-access-key"
AWS_SECRET_ACCESS_KEY: "your-secret-key"
# Configure via UI or CLI
# Settings > Backup Target
# s3://bucket-name@region/path
NFS Backup:
nfs://server-ip:/path/to/backup
Create Manual Backup
apiVersion: longhorn.io/v1beta2
kind: Backup
metadata:
name: my-data-backup
namespace: longhorn-system
spec:
snapshotName: my-data-snapshot
labels:
app: my-app
type: manual
Recurring Backups
apiVersion: longhorn.io/v1beta2
kind: RecurringJob
metadata:
name: daily-backup
namespace: longhorn-system
spec:
cron: "0 2 * * *" # Every day at 2:00 AM
task: backup
retain: 7
concurrency: 1
groups:
- default
Restore from Backup
1. Via UI: Volumes > Create Volume > From Backup
2. Via PVC:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: restored-data
spec:
accessModes:
- ReadWriteOnce
storageClassName: longhorn
resources:
requests:
storage: 10Gi
dataSource:
name: my-data-backup
kind: Backup
apiGroup: longhorn.io
Snapshots
Create Snapshot
apiVersion: longhorn.io/v1beta2
kind: Snapshot
metadata:
name: my-data-snap-1
namespace: longhorn-system
spec:
volume: my-data
labels:
app: my-app
Recurring Snapshots
apiVersion: longhorn.io/v1beta2
kind: RecurringJob
metadata:
name: hourly-snapshot
namespace: longhorn-system
spec:
cron: "0 * * * *" # Every hour
task: snapshot
retain: 24 # Keep last 24
concurrency: 2
groups:
- default
Monitoring
Prometheus Metrics
Longhorn exposes Prometheus metrics:
# ServiceMonitor for Prometheus Operator
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: longhorn
namespace: longhorn-system
spec:
selector:
matchLabels:
app: longhorn-manager
endpoints:
- port: manager
Main Metrics
| Metric | Description |
|---|---|
longhorn_volume_actual_size_bytes | Actual volume size |
longhorn_volume_capacity_bytes | Volume capacity |
longhorn_volume_state | Volume state |
longhorn_node_storage_capacity_bytes | Node storage capacity |
longhorn_node_storage_usage_bytes | Node storage usage |
Grafana Dashboard
Import dashboard ID: 13032 (Longhorn Dashboard)
Production Best Practices
Replica Count
# Minimum 3 replicas for HA
parameters:
numberOfReplicas: "3"
Node Scheduling
Ensure replicas are distributed:
# Settings > Replica Node Level Soft Anti-Affinity: true
# Settings > Replica Zone Level Soft Anti-Affinity: true
Disk Scheduling
# Add tags to disks
kubectl -n longhorn-system label nodes node1 storage=ssd
parameters:
diskSelector: "ssd"
Backup Policy
# Daily backup with 7-day retention
apiVersion: longhorn.io/v1beta2
kind: RecurringJob
metadata:
name: daily-backup
spec:
cron: "0 3 * * *"
task: backup
retain: 7
Troubleshooting
Volume Degraded
# Check volume status
kubectl -n longhorn-system get volumes.longhorn.io
# Details
kubectl -n longhorn-system describe volume my-volume
Common causes:
- Node down
- Disk full
- Corrupted replica
Slow Replica Rebuild
# Increase concurrent rebuild
# Settings > Concurrent Replica Rebuild Per Node Limit: 5
Insufficient Space
# Check space per node
kubectl -n longhorn-system get nodes.longhorn.io -o wide
Solutions:
- Add disks
- Delete old snapshots
- Temporarily reduce replica count
Comparison with Alternatives
| Feature | Longhorn | Rook-Ceph | OpenEBS | Portworx |
|---|---|---|---|---|
| Complexity | Low | High | Medium | Medium |
| Performance | Good | Excellent | Good | Excellent |
| Integrated UI | Yes | No | Yes | Yes |
| S3 Backup | Yes | Yes | Yes | Yes |
| License | Apache 2.0 | Apache 2.0 | Apache 2.0 | Commercial |
| CNCF | Graduated | Graduated | Sandbox | No |
When to choose Longhorn:
- Small to medium clusters
- Teams with limited Kubernetes experience
- Need for quick setup
- Limited budget
Conclusions
Longhorn is the ideal solution for those looking for distributed storage on Kubernetes without the complexity of Ceph or the costs of enterprise solutions.
Implementation checklist:
- Node prerequisites (open-iscsi)
- Installation via Helm
- StorageClass configuration
- Backup target setup (S3/NFS)
- Recurring backup configured
- Active monitoring
- Restore testing