Fundamentals of Cloud Storage

Virtual Disks, Data Management, and Cloud Architecture

Why Cloud Storage?

The Shift from CapEx to OpEx

  • Traditional (On-Prem):
    • Capital Expenditure (CapEx): Buy hardware upfront.
    • Rigid: Scaling requires buying new servers/disks.
    • Maintenance: You manage repairs, replacements, and power.
  • Cloud (IaaS/PaaS):
    • Operational Expenditure (OpEx): Pay as you go.
    • Elastic: Scale up or down instantly.
    • Managed: Provider handles hardware failures and replacements.
  • Key Benefit: Elasticity. You only pay for what you consume right now.

Data Lifecycle & Storage Tiers

Not All Data is Equal

  • Hot Tier: Frequently accessed data. High performance, higher cost.
  • Cool/Cold Tier: Infrequently accessed data. Lower performance, lower cost.
  • Archive Tier: Rarely accessed (compliance, long-term backup). Lowest cost, retrieval takes minutes/hours.

From Creation to Destruction

  • Create/Ingest: Data is generated (logs, uploads, transactions).
  • Active Use: Frequently accessed, requires high performance.
  • Inactive: Rarely accessed, but must remain available.
  • Archive: Long-term retention, compliance, rarely accessed.
  • Destroy: Secure deletion when no longer needed.

Lifecycle Policies

  • Automate movement between tiers based on age or access patterns.
  • Data retention is software-defined.

Example: "Move logs older than 30 days to Cool, delete after 1 year."

Mapping data life cycle to Cloud Tiers:

  • Hot Tier: Active Use (High Cost, Low Latency).
  • Cool/Cold Tier: Inactive (Lower Cost, Slight Latency).
  • Archive Tier: Archive (Lowest Cost, High Retrieval Time).

Redundancy - How Do We Keep Data Safe?

  • Replication:
    • Locally Redundant (LRS): 3 copies in one data center.
    • Geo-Redundant (GRS): Copies in a secondary region (hundreds of km away).
  • Snapshots vs. Backups:
    • Snapshot: Point-in-time copy of a disk (Block). Very fast.
    • Backup: Copy of data (File/Object). Can be restored to a new location.
  • Immutability:
    • WORM (Write Once, Read Many) to prevent ransomware deletion.

Backup

A distinct copy of data created to restore information after data loss (deletion, corruption, ransomware).

  • Snapshots are point-in-time pointers (fast, efficient, but dependent on source disk).
  • Backups are independent copies.

The 3-2-1 Rule of Backup

  • 3 copies of your data (1 production + 2 backups).
  • 2 different media types (e.g., Primary Disk + Object Storage).
  • 1 copy offsite (e.g., Different Region or Cloud Provider).

Offsite in the cloud protects against

  • regional outages (e.g., Azure US East failure).
  • account compromise (if primary region is encrypted by ransomware).

Cost Management (FinOps)

- Use Lifecycle Policies to move old data to Archive.
- Delete unused disks (Orphaned Volumes).
- Choose the right redundancy level (LRS vs GRS).
  • Storage Costs:
    • Price per GB/month.
  • Operations Costs:
    • API Requests: (e.g., $0.0005 per 10,000 PUT requests in S3).
    • Data Egress: Moving data out of the cloud is expensive.
    • IOPS/Throughput: Premium disks cost more for speed.

The Three Main Storage Types

Type Access Method Best For Analogy
Block Raw Disk (iSCSI, Fibre) Databases, OS Boot A Hard Drive inside your PC
File Folder Structure (NFS, SMB) Shared Home Directories, Lift & Shift A Network Folder in an office
Object HTTP/REST API Images, Backups, Static Web A Warehouse with barcodes

Block Storage - High Performance, Low Latency

  • Characteristics:
    • Appears as a raw device (e.g., /dev/sdb) to the OS.
    • Requires formatting (NTFS, ext4) by the Guest OS.
    • Usually attached to one VM at a time (ReadWriteOnce).
  • Use Cases:
    • Operating System Disks.
    • Transactional Databases (SQL, MySQL)., High-performance computing.

File Storage - Shared Access & Standard Protocols

  • Characteristics:
    • Shared network drive (File System level).
    • Supports NFS (Linux) and SMB (Windows) protocols.
    • Multiple VMs can read/write simultaneously.
  • Use Cases:
    • Content Management Systems (CMS).
    • Shared Home Directories for users.
    • Legacy applications expecting a file path.

Object Storage - HTTP Based

  • Characteristics:
    • Data stored as "Objects" (File + Metadata + Unique ID).
    • Accessed via REST API (HTTP GET/PUT).
    • Flat structure (Buckets/Containers), no folders in the traditional sense.
    • Massive durability (11x9s).
  • Use Cases:
    • Backup & Archiving.
    • Static Website Hosting.
    • Images, Video, Media Libraries, Big Data Analytics input.

Provider Mapping - Azure vs. AWS vs. GCP

Block Storage Managed Disks Amazon EBS Persistent Disk
File Storage Azure Files Amazon EFS / FSx Cloud Filestore
Object Storage Azure Blob Storage Amazon S3 Google Cloud Storage
NoSQL Cosmos DB DynamoDB Firestore / Bigtable

Databases in the Cloud - IaaS vs. PaaS Storage

  • Database on VM (IaaS):
    • You install SQL/MySQL on a VM.
    • You manage OS patches, backups, and storage scaling.
    • Storage: Uses Block Storage (Managed Disks/EBS).
  • Database as a Service (PaaS):
    • Azure SQL Database, AWS RDS, Azure Cosmos DB.
    • Provider manages backups, patching, and scaling.
    • Benefit: Less maintenance, faster deployment.

Data Analytics and Reporting

Data Warehouse

  • Structured storage for processed, organized data
  • Designed for business intelligence (BI) and reporting
  • Uses schema-on-write (data structured before storing)
  • Optimized for SQL queries and analytics

Amazon Redshift, Google BigQuery, Azure Synapse Analytics, Snowflake

Data Lake

  • Stores raw data in any format (structured, semi-structured, unstructured)
  • Designed for big data analytics, ML, and data science
  • Uses schema-on-read (structure applied when data is analyzed)
  • Scales easily for large volumes of data

Amazon S3 Data Lake, Azure Data Lake Storage, Google Cloud Storage / Dataproc

Specialized Repositories - Beyond Standard Storage

  • Data Lakes:
    • Store structured and unstructured data at any scale.
    • Azure: Data Lake Storage Gen2.
    • AWS: S3 + Lake Formation.
  • Search & AI:
    • Azure AI Search: Text indexing and vector search.
    • Azure AI Vision: Image analysis and storage.
  • Caching:
    • Redis: In-memory storage for high-speed access (not for persistence).

Summary

  • There are three types of cloud storage: block, filesystem and object
  • There are three storage tiers: hot, warm, cold
  • Plan your redundancy and data lifecycle
Reload?