Virtual disk storage

Data in the cloud

  • How to rent and use disk space
  • How to manage disk space in a private cloud

Data management policy:

Backup and archiving strategy:

  • what if the data is no longer needed? (eg the project ended)?
  • what in case of a crash or malfunction? What kind of resilience do we need?
  • Which data to archive and for how long?

How to backup and archive?

  • tape devices
  • DVD
  • HDD, flash, floppy disks, magneto optical disks
  • Claude

The amount of disk space required varies - usually increases:

  • it is necessary to "rent" or "buy and install" disk space.

Total price for storage

Fixed costs:

  • Server room, disks, devices

Variable costs:

  • Management, maintenance, electrical energy

Storage virtualization in the server room

  • Disk space can be managed as a whole for the entire cluster.
  • regularly replace failed devices, because disks are unreliable.
  • redundancy - more copies of the same data.
  • backup to protect against human error or a security incident.
  • also deal with security and access rights.

Storage scaling in the server room

Horizontal: If I have space in the server - I will buy a disk.

Vertical: If I have space in the server room and the budget - I will buy a server.

  • If I don't have a budget - I have to delete something.
  • If I don't have space in the server room - I will build a server room.
  • It is "not possible" to reduce the required space.

Connecting disk space

  • block devices.
  • file systems.
  • the entire application - object storage or database with the application protocol.

Virtualization at the level of block devices

A new disk will "appear" in the virtual machine

+-----+ +----------+ +---------------+
| App |-| Guest OS |-| App Container |
+-----+ +----------+ +---------------+
                             |
                         App protocol
                             |
                  +----------------------+
                  | Storage Area Network |
                  +----------------------+

Virtualization at the level of block devices

  • iSCSI protocol
  • Ceph, Gluster

Virtualization at the file system level

I can map a new directory to the container

  • network file systems: NFS, SMB
  • distributed file systems: Gluster, Ceph, HDFS

Virtualization at the file system level

+-----+ +----------+ +---------------+
| App |-| Guest OS |-| App Container |
+-----+ +----------+ +---------------+
             |
         App protocol
             |
  +--------------------------+
  | Network Attached Storage |
  +--------------------------+

Disk space at the application level

  • object storage: S3 - Minio, Swift
  • NoSQL databases
  • Relational databases
  • Caches and brokers: Redis

The application communicates with the storage using the application protocol. No mapping required, only application configuration.

Disk space at the application level

+-----+ +----------+ +---------------+
| App |-| Guest OS |-| App Container |
+-----+ +----------+ +---------------+
     |
 App protocol
     |
  +--------------+
  | Data Storage |
  +--------------+

Database

SQL

  • SQL Server
  • Azure SQL
  • Postgres, Mysql-MariaDB

NoSQL

  • CosmosDB (MongoBD, Cassandra compatible)
  • Redis, Cassandra,

Data in the cloud

File space can be rented:

  • I am canceling fixed costs.
  • I will streamline variable costs.
  • I will increase flexibility - I only pay for what I need.
  • It is possible to scale down - get rid of unnecessary space.

But sometimes you still need to have your own HW.

Storage on Azure

  • Storage at a lower level (IaaS)
  • Azure Managed Disks - block device, connected ReadWriteOnce to the virtual machine. It is paid for the entire allocated capacity. It is possible to choose the desired redundancy and speed.

Azure Storage Account

Access via application protocol:

  • Azure Blob Storage - (public) object storage. REST API for Azure Blob
  • Azure Files - [SMB or NFS folder](https://learn.microsoft.com/en-us/azure/app-service/configure-connect-to-azure-storage?tabs=access-key%2Ccli&pivots=container -linux)
  • Queues
  • Tables for structured data

Database in the cloud

Software as a Service (SAAS)

  • Azure SQL Database: MS SQL Server
  • Azure Cosmos DB: Nosql Database
  • Azure Database for PostgreSQL
  • Azure Database for Mysql

Special repositories:

  • DataLake: for big data analytics
  • Azure AI Search: for text data processing
  • Azure AI Vision: for image processing.

Storage on AWS

  • Amazon EBS - block device
  • Amazon Elastic File System - network file system
  • Amazon S3 - object storage

Comparison of Azure and Amazon AWS

Storage on Google

  • Google Cloud Storage - object storage
  • Google File Store - network file system
  • Google Persistent Disk - block device
Reload?