Longhorn: Distributed Storage for Kubernetes - Complete Guide

Storage is one of the most complex problems in Kubernetes. Containers are ephemeral, but data must persist. Longhorn, a CNCF graduated project, solves this problem by offering distributed, resilient, and easy-to-manage storage. In this guide, we'll see how to install and configure Longhorn for your Kubernetes cluster.

The Storage Problem in Kubernetes

Ephemeral Storage

By default, Pod storage is ephemeral:

apiVersion: v1
kind: Pod
metadata:
  name: my-app
spec:
  containers:
    - name: app
      image: my-app:latest
      # Everything in /data is lost on restart!

Traditional Solutions

SolutionProsCons
hostPathSimpleNo portability, no HA
NFSSharedSingle point of failure
Cloud ProviderManagedVendor lock-in, costs
CephPowerfulHigh complexity
LonghornSimple + DistributedRequires resources

What is Longhorn

Longhorn is a distributed block storage system for Kubernetes that:

  • Uses local node disks
  • Replicates data across multiple nodes
  • Manages snapshots and backups
  • Provides integrated web UI
  • Supports DR and migration

Architecture

Longhorn Architecture - Kubernetes Cluster with Longhorn Manager, Engine, Replicas and Local Disks
Longhorn Architecture: distributed storage with replicas on local node disks

Main Components

ComponentFunction
Longhorn ManagerOrchestration, API, UI
Longhorn EngineiSCSI target, manages replicas
ReplicaData copy on local disk
CSI DriverKubernetes integration

Requirements

Hardware

ResourceMinimumRecommended
Nodes33+
CPU per node2 cores4+ cores
RAM per node4 GB8+ GB
DiskSSD 50 GBSSD/NVMe 200+ GB

Software

Kubernetes: 1.25+
OS: Ubuntu 20.04+, RHEL 8+, SLES 15+
Filesystem: ext4, XFS

Node Prerequisites

# Each node must have open-iscsi
# Ubuntu/Debian
sudo apt install open-iscsi
sudo systemctl enable iscsid
sudo systemctl start iscsid

# RHEL/CentOS
sudo yum install iscsi-initiator-utils
sudo systemctl enable iscsid
sudo systemctl start iscsid

Installation

Method 1: Helm (Recommended)

# Add repo
helm repo add longhorn https://charts.longhorn.io
helm repo update

# Install
helm install longhorn longhorn/longhorn \
  --namespace longhorn-system \
  --create-namespace \
  --set defaultSettings.defaultDataPath="/var/lib/longhorn" \
  --set defaultSettings.defaultReplicaCount=3

Method 2: Kubectl

kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.6.0/deploy/longhorn.yaml

Verify Installation

# Check pods
kubectl -n longhorn-system get pods

# Expected output
NAME                                        READY   STATUS    RESTARTS
longhorn-manager-xxxxx                      1/1     Running   0
longhorn-driver-deployer-xxxxx              1/1     Running   0
longhorn-ui-xxxxx                           1/1     Running   0
engine-image-ei-xxxxx                       1/1     Running   0
instance-manager-xxxxx                      1/1     Running   0

UI Access

# Port forward
kubectl -n longhorn-system port-forward svc/longhorn-frontend 8080:80

# Access http://localhost:8080

StorageClass Configuration

Default StorageClass

Longhorn automatically creates a StorageClass:

kubectl get storageclass
# NAME                 PROVISIONER          RECLAIMPOLICY
# longhorn (default)   driver.longhorn.io   Delete

Custom StorageClass

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: longhorn-ssd
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Retain
volumeBindingMode: Immediate
parameters:
  numberOfReplicas: "3"
  staleReplicaTimeout: "2880"
  fromBackup: ""
  fsType: "ext4"
  dataLocality: "best-effort"

Important Parameters

ParameterDescriptionDefault
numberOfReplicasNumber of replicas3
dataLocalityData locality (disabled, best-effort, strict-local)disabled
diskSelectorSelect specific disks-
nodeSelectorSelect specific nodes-

Creating and Using Volumes

PersistentVolumeClaim

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: my-data
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn
  resources:
    requests:
      storage: 10Gi

Using the Volume in a Pod

apiVersion: v1
kind: Pod
metadata:
  name: my-app
spec:
  containers:
    - name: app
      image: my-app:latest
      volumeMounts:
        - name: data
          mountPath: /data
  volumes:
    - name: data
      persistentVolumeClaim:
        claimName: my-data

StatefulSet with Longhorn

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: postgres
spec:
  serviceName: postgres
  replicas: 1
  selector:
    matchLabels:
      app: postgres
  template:
    metadata:
      labels:
        app: postgres
    spec:
      containers:
        - name: postgres
          image: postgres:15
          env:
            - name: POSTGRES_PASSWORD
              value: "password"
            - name: PGDATA
              value: /var/lib/postgresql/data/pgdata
          volumeMounts:
            - name: data
              mountPath: /var/lib/postgresql/data
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: longhorn
        resources:
          requests:
            storage: 20Gi

Backup and Disaster Recovery

Configure Backup Target

Longhorn supports backup to S3 or NFS.

S3 Backup:

# Secret for S3 credentials
apiVersion: v1
kind: Secret
metadata:
  name: s3-secret
  namespace: longhorn-system
type: Opaque
stringData:
  AWS_ACCESS_KEY_ID: "your-access-key"
  AWS_SECRET_ACCESS_KEY: "your-secret-key"
# Configure via UI or CLI
# Settings > Backup Target
# s3://bucket-name@region/path

NFS Backup:

nfs://server-ip:/path/to/backup

Create Manual Backup

apiVersion: longhorn.io/v1beta2
kind: Backup
metadata:
  name: my-data-backup
  namespace: longhorn-system
spec:
  snapshotName: my-data-snapshot
  labels:
    app: my-app
    type: manual

Recurring Backups

apiVersion: longhorn.io/v1beta2
kind: RecurringJob
metadata:
  name: daily-backup
  namespace: longhorn-system
spec:
  cron: "0 2 * * *"  # Every day at 2:00 AM
  task: backup
  retain: 7
  concurrency: 1
  groups:
    - default

Restore from Backup

1. Via UI: Volumes > Create Volume > From Backup

2. Via PVC:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: restored-data
spec:
  accessModes:
    - ReadWriteOnce
  storageClassName: longhorn
  resources:
    requests:
      storage: 10Gi
  dataSource:
    name: my-data-backup
    kind: Backup
    apiGroup: longhorn.io

Snapshots

Create Snapshot

apiVersion: longhorn.io/v1beta2
kind: Snapshot
metadata:
  name: my-data-snap-1
  namespace: longhorn-system
spec:
  volume: my-data
  labels:
    app: my-app

Recurring Snapshots

apiVersion: longhorn.io/v1beta2
kind: RecurringJob
metadata:
  name: hourly-snapshot
  namespace: longhorn-system
spec:
  cron: "0 * * * *"  # Every hour
  task: snapshot
  retain: 24  # Keep last 24
  concurrency: 2
  groups:
    - default

Monitoring

Prometheus Metrics

Longhorn exposes Prometheus metrics:

# ServiceMonitor for Prometheus Operator
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: longhorn
  namespace: longhorn-system
spec:
  selector:
    matchLabels:
      app: longhorn-manager
  endpoints:
    - port: manager

Main Metrics

MetricDescription
longhorn_volume_actual_size_bytesActual volume size
longhorn_volume_capacity_bytesVolume capacity
longhorn_volume_stateVolume state
longhorn_node_storage_capacity_bytesNode storage capacity
longhorn_node_storage_usage_bytesNode storage usage

Grafana Dashboard

Import dashboard ID: 13032 (Longhorn Dashboard)

Production Best Practices

Replica Count

# Minimum 3 replicas for HA
parameters:
  numberOfReplicas: "3"

Node Scheduling

Ensure replicas are distributed:

# Settings > Replica Node Level Soft Anti-Affinity: true
# Settings > Replica Zone Level Soft Anti-Affinity: true

Disk Scheduling

# Add tags to disks
kubectl -n longhorn-system label nodes node1 storage=ssd
parameters:
  diskSelector: "ssd"

Backup Policy

# Daily backup with 7-day retention
apiVersion: longhorn.io/v1beta2
kind: RecurringJob
metadata:
  name: daily-backup
spec:
  cron: "0 3 * * *"
  task: backup
  retain: 7

Troubleshooting

Volume Degraded

# Check volume status
kubectl -n longhorn-system get volumes.longhorn.io

# Details
kubectl -n longhorn-system describe volume my-volume

Common causes:

  • Node down
  • Disk full
  • Corrupted replica

Slow Replica Rebuild

# Increase concurrent rebuild
# Settings > Concurrent Replica Rebuild Per Node Limit: 5

Insufficient Space

# Check space per node
kubectl -n longhorn-system get nodes.longhorn.io -o wide

Solutions:

  • Add disks
  • Delete old snapshots
  • Temporarily reduce replica count

Comparison with Alternatives

FeatureLonghornRook-CephOpenEBSPortworx
ComplexityLowHighMediumMedium
PerformanceGoodExcellentGoodExcellent
Integrated UIYesNoYesYes
S3 BackupYesYesYesYes
LicenseApache 2.0Apache 2.0Apache 2.0Commercial
CNCFGraduatedGraduatedSandboxNo

When to choose Longhorn:

  • Small to medium clusters
  • Teams with limited Kubernetes experience
  • Need for quick setup
  • Limited budget

Conclusions

Longhorn is the ideal solution for those looking for distributed storage on Kubernetes without the complexity of Ceph or the costs of enterprise solutions.

Implementation checklist:

  • Node prerequisites (open-iscsi)
  • Installation via Helm
  • StorageClass configuration
  • Backup target setup (S3/NFS)
  • Recurring backup configured
  • Active monitoring
  • Restore testing

Resources