Storage in Kubernetes

Maintaining application state

Applications use disks for various reasons:

  • configuration
  • data in the database
  • temporary buffer memory (cache)

Maintaining application state

Application state must be separated from the process

  • using the application protocol.
  • using volume mapping or virtual block devices
  • using local disks or local file system
  • using a RAM disk

Storage management at the application level

  • object storage (minio)
  • relational database (postgres, mariadb)

Kubernetes "doesn't know" about disk space

State management at the cluster level

  • local directory or disk
  • distributed fs (ceph, gluster)
  • networked fs (nfs, smb)

State management at the Kubernetes level

https://docs.microsoft.com/en-us/azure/aks/concepts-storage

Kuberntes "wraps" the storage in its object.

  • Volumes
  • PersistentVolumes
  • StorageClasses
  • PersistentVolumeClaims
  • StatefulSet

PersistentVolume

  • We declare volumes available
  • "wraps" a specific folder or block device

Access to the folder is:

  • local
  • NFS, SMB
  • iSCSI
  • Different network protocol

https://kubernetes.io/docs/concepts/storage/persistent-volumes/#raw-block-volume-support,

Types of PersistentVolume

  • csi - Container Storage Interface (CSI)
  • fc - Fiber Channel (FC) storage
  • hostPath - HostPath volume (for single node testing only; WILL NOT WORK in a multi-node cluster; consider using local volume instead)
  • iscsi - iSCSI (SCSI over IP) storage
  • local - local storage devices mounted on nodes.
  • nfs - Network File System (NFS) storage

Creating permanent volumes

Static provisioning

  • PersistentVolume is created by the administrator (kubectl apply).

Dynamic provisioning

  • The named volume is created by the application on demand using the PersistentVolumeClaim object.

Affinity: The volume will be bound to a specific node.

Dynamic mapping

We need:

  • StorageClass: Notifies that we have available space on request
  • Provisioner: Performs a storage allocation request

PersistentVolumeClaim

We create the object that belongs to the application together with Deployment or StatefulSet

  • we declare a request for some PersistentVolume
  • defines the link between the application and the repository

Access to a local or distributed file system

Creating a repository binding

+------------+ Binding +--------------+
| Persistent |<--------| Persistent   |
| Volume     |         | Volume Claim |
+------------+         +--------------+
  Hardware               Container

StatefulSet

(similar to Deployment)

We declare `PODs' that the application needs

We declare the claims of the application on volumes

StatefulSet depends on PersistentVolume.

StatefulSet

  • manages POD using ReplicaSet
  • Also takes care of PersistentVolume using PersistentVolumeClaim
         Storage
   +------------------+
   | PersistentVolume | LoadBalancer
   +------------------+
      ^ P.V. Claim            ^
      |                       |
   +--------------+ Port  +-----------+
   | POD Template |------>| Service   |
   | ReplicaSet   |       +-----------+
   | StatefulSet  |
   +--------------+
         node

Storage management is complicated

The worst task is to "free" a 2 TB disk full of various data

Reload?