You will learn

  • Create a persistent volume.
  • Create an application that can keep its state in the volume.
  • use StatefulSet, PersistentVolumeClaim, PersistentVolume objects.

Persistent volumes and data in Kubernetes

In the world of Kubernetes, we have to reckon with the fact that the container can start at any node and can end its activity at any time. This is not a problem if the container only processes requests and sends responses, e.g. performs some calculations. Complications occur when the container stores important data.

If the output of the application depends only on the input, then we say that the application is stateless - nothing but the input affects the output. An example is an image compression service - it can produce a compressed image from an uncompressed image on input and does not need any additional data. If the container does not have its state, then it basically does not matter on which node it runs.

If the container stores and reads some data and the data affects the processing result, we say that it has its state. Status is a set of data that is important for the result of processing. An example is a database - it modifies or reads a set of files in a directory according to commands. It is clear that problems will arise if the database stops working on one node and moves to another if the files that represent the state of the database are not moved.

Mostly this problem is solved by keeping important state files in a dedicated location, such as a network file system that is accessible from all nodes. If we can properly separate the state from the rest of the container, it is possible to easily restart the pod on another node, because the important data remains in one place.

Persistent volumes

Persistent volumes (PersistentVolumes) are a way to separate the state of an application from a container.

PersistentVolume is a type of object that expresses one place where data can be stored - an entire disk, a directory, a network drive, or a shared directory on a network file system.

We know several types of persistent volumes depending on what source they represent:

  • hostPath: directory on the current node (usable only for single node cluster)
  • localPersistentVolume: directory or disk on a specific node.
  • iSCSI: network storage server disk with iSCSI protocol
  • NFS: UNIX shared network directory
  • GlusterFS: distributed filesystem
  • RBD: Ceph distributed file system block device
  • various other ways for different providers (Azure, Amazon, Google ..)

Official documentation of persistent volumes

We choose the type of persistent volume according to what disk capacity we have available. We can create a persistent volume object manually or automatically as required.

Persistent volume in Docker Engine

If we are installing Kubernetes using the Docker Engine, we will implement the persistent volume using hostPath.

See tutorials.

Create a desktoppv1.yaml configuration file with a persistent volume.

apiVersion: v1
kind: PersistentVolume
metadata:
  name: desktoppv1
  labels:
    type: local
spec:
  storageClassName: local
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Delete
  hostPath:
    path: "/tmp/desktoppv1"

Persistent volumes in the cluster do not belong to any namespace,

Here's how to create a persistent volume:

kubectl apply -f desktoppv1.yaml
kubectl get pv

The persistent volume expresses that space is available in a specific directory on the virtual machine where Kubernetes is running and "wrapping" kubernetes with an object of type PersistentVolume. If we want to use this space in a container, we must create a mapping.

A StorageClass object is used to configure automatic allocation. If the system is configured for this, a part of the available disk capacity is automatically reserved as required.

If not, then the system waits for the administrator to create the space (directory or disk) manually and create an object of type PersistentVolume.

StatefulSet

The mapping between the persistent volume and Pod is provided by an object of type PersistentVolumeClaim. This object expresses a specific application request for storage space. If there is a PersistentVolume that matches the request, a mapping will be created.

We can create this object separately, or use a special object of type StatefulSet. It contains rules for creating and deleting pods on any node so that access to their status is maintained. If the application data is located on the local disk, Kubernetes will ensure that the pod runs only on the node where the volume is located. This makes the files in the persistent volume accessible transparently. As a rule, the container does not need to know where and how the files it works with are stored.

Let's try an example of StatefulSet and demonstrate deploying a relational database.

The StatefulSet object will be similar to the Deployment object, but it will also contain a template for creating the request to a persistent volume of type PersistentVolumeClaim:

Create a file postgres_ss.yaml and write in it:

# Used API
apiVersion: apps/v1
# Object type
kind: StatefulSet
# Object name
metadata:
  name: postgres
# Object specification
spec:
  # link to the pod according to the pod label
  selector:
    matchLabels:
      app: postgres
  # service name
  serviceName: postgres
  # Number of instances of pod
  replicas: 1
  # Pod template
  template:
    metadata:
      # Pod label
      labels:
        app: postgres
    spec:
      # Pod containers
      containers:
        # container name
      - name: postgres
        # Image name
        image: postgres:10.5
        # open container port
        ports:
          - name: postgres
            containerPort: 5432
            protocol: TCP
        # Container configuration environment variables
        env:
            # Database user name
          - name: POSTGRES_USER
            value: postgres
            # Name of the database
          - name: POSTGRES_DB
            # Password to connect to the database
            value: postgres
          - name: POSTGRES_PASSWORD
            value: verysecret
        # Persistent volume requirements
        volumeMounts:
          - mountPath: /var/lib/postgresql/data
            # Name of the volume request
            name: postgrespvc
  volumeClaimTemplates:
  - metadata:
      # the name of the persistent volume request
      # Must be the same as the volume name
      # in volumeMounts in Pod
      name: postgrespvc
    spec:
      accessModes: ["ReadWriteOnce"]
      # Automatic assignment of a persistent volume
      # We know which volume to allocate accordingly
      storageClassName: "local"
      resources:
        requests:
          # Persistent volume size requirements
          storage: 1Gi

In the volumeClaimTemplates section, we write templates for requests for a persistent volume of the PersistentVolumeClaim type (there can be more than one). In this case, we have declared an interest in a directory with a size of at least 1 GB (storage: 1Gi), which can be written to by a maximum of one process and read by any number of processes. (accessModes: ["ReadWriteOnce"]). We have requested automatic allocation of the persistent volume (storageClassName: local).

Create an object of type StatefulSet

kubectl apply -f postgres_ss.yaml -n cv7

When you create an object of type StatefulSet, an object of type PersistentVolumeClaim is automatically created, which represents the request to create a persistent volume. Let's see what happens in our cluster:

# Most important objects, but PersistentVolumeClaim not visible
kubectl get all -n cv7
kubectl describe statefulset/postgres -n cv7
# PersistenVolume Claim must be considered separately
kubectl get pvc -n cv7
# Find out the name of the persistent volume request
kubectl describe statefulsets/postgres -n cv7
kubectl describe pvc/postgrespvc-postgres-0 -n cv7
# Let's see the state of the persistent volume
kubectl describe pv/desktoppv1
# Persistent volumes do not have a namespace

The StatefulSet object, like Deployment, creates ReplicaSet and Pod objects. In addition, it creates a request to create a persistent volume of type PersistentVolumeClaim according to the specified template. When there is a suitable object of type PersistentVolume (persistent volume) for the object PersistentVolumeClaim, it is possible to create a mapping and start a new pod managed by the StatefulSet object.

The database should run. If not, use the get, describe, or logs commands to find the cause.

Exposing the Service

If the persistent volume is OK, we can test the functionality a new object of type StatefulSet.

We express the presence of a new service in the cluster using the Service object. This will tell you under what name and port the service will be available.

Create a service with the postgresql database so that other objects can use it.

Save the service configuration e.g. to the file postgres-service.yaml.

apiVersion: v1
kind: Service
metadata:
  name: postgresservice
spec:
  selector:
    app: postgres
  type: ClusterIP
  ports:
    - protocol: TCP
      # Service port
      port: 5432
      # Container port
      targetPort: 5432

This service called postgresservice will only be available within the cluster on port 5432.

Graphical interface to the database

Let's create an interface with which we can connect to the database.

The "pgadmin" web interface will run as a separate Deployment object with the service. When we connect to it using a web browser, we will also have a database service. The database will run under the DNS name postgresservice.

The "pgadmin" interface will be accessible from the browser on port 30881.

In this example, we place both the service and the deployment in one file, the configurations are separated ---.

File pgadmin-deploymentservice.yaml:

apiVersion: v1
kind: Service
metadata:
  name: pgadmin
spec:
  selector:
    app: pgadmin
  # The service type changes from ClusterIP to NodePort
  type: NodePort
  ports:
    - protocol: TCP
      port: 8800
      targetPort: 80
      # Port visible on each node
      nodePort: 30881
---
# See the API version for documentation
apiVersion: apps/v1
# Object type
kind: Deployment
# About the object
metadata:
  # Object name
  name: pgadmin-deployment
# object specification
spec:
  # The number of pods to create
  replicas: 1
  # The selector creates a Deployment and Pod link
  # Selects those PODs that have the tag pgadmin
  selector:
    matchLabels:
      app: pgadmin
  # POD template
  template:
    metadata:
      # POD label - to connect Deployment and Pod
      labels:
        app: pgadmin
    spec:
      # POD containers
      containers:
      # Only one pgadmin container
      - name: pgadmin
        # Image name and version
        image: dpage/pgadmin4
        ports:
        # POD has port 80 open
        - containerPort: 80
        env:
        - name: PGADMIN_DEFAULT_EMAIL
          value: admin@admin.sk
        - name: PGADMIN_DEFAULT_PASSWORD
          value: verysecret

We apply the configuration and over time the cluster should run a web application with which we can connect to the database and insert something into it.

We can use environment variables to specify a login name and password.

Other materials

Previous Post Next Post

Stateful applications, virtual disks in Kubernetes