Where is VM data stored?

Virtual machines (VMs) provide isolated, virtualized computing environments that operate based on configured resources. Understanding where VM data is stored is key for performance, security, and data protection. This article explores the common locations for VM data storage and best practices for optimizing and securing your VM infrastructure.

Where is VM data stored?

At a high level, VMs separate compute resources from storage resources, allowing each to scale independently. There are two primary locations for VM data storage:

Local/Direct Attached Storage

Many hypervisors allow VMs to use local storage directly attached to the host system. This provides low latency access but lacks redundancy if the host fails. Local storage options include:

  • Hard drives connected via SATA, SAS, or NVMe
  • SSD drives for faster performance
  • Hardware RAID for redundancy
  • Partitioning of local disks

Shared/Network Storage

For better scalability and redundancy, most production VMs use shared storage on a storage area network (SAN) or network attached storage (NAS). Benefits include:

  • Centralized storage independent of hosts
  • Flexible allocation of storage capacity
  • High availability with redundant components
  • Advanced features like snapshots, replication, deduplication and more

Common shared storage platforms include Fibre Channel SAN, iSCSI SAN, SMB file shares, NFS shares, and virtual storage area networks (VSAN).

Other Locations

Some other potential data storage locations for VMs include:

  • Public cloud block storage services like AWS EBS
  • Object storage services like AWS S3 or Azure Blob storage
  • External SAN replicated to cloud or disaster recovery sites
  • Virtual machine backups/archives to secondary disk or tape storage

Best Practices for VM Data Storage

Follow these best practices when planning VM data storage:

Performance

  • Use SSD storage – Flash-based SSDs provide much faster data access times compared to traditional spinning disks. Use SSD for highest performance needs.
  • Allocate sufficient IOPS – Storage performance is measured in input/output operations per second (IOPS). Allocate enough IOPS to meet VM workload demands.
  • Use storage tiers – Tier storage across SSD, high-speed disk and archival disk to optimize cost and performance.
  • Align partitions – Align partitions properly on shared storage for optimal performance.

Availability

  • Redundancy – Use RAID, redundant components, erasure coding and replication to provide high availability storage for VMs.
  • Multipath – For SAN storage, configure multipathing software so VMs have continuous access if one connection path fails.
  • Snapshots – Take regular crash consistent snapshots to provide point-in-time recovery options.

Manageability

  • Thin provision – Allocate flexible virtual capacity to VMs, while drawing storage on-demand from a pool.
  • Storage spaces – Abstract physical storage into pools, volumes and shares that can be allocated as needed.
  • Document – Maintain documentation on VM storage layouts, assignments and capacity.

Security

  • Permissions – Properly configure access control lists, permissions, and encryption to restrict unauthorized access.
  • Isolate – Use separate VM storage from hypervisor management to prevent data access if the hypervisor is compromised.
  • Immutability – Where needed, implement immutable storage with WORM (write once, read many) to prevent accidental/malicious deletion or modification of VM data.

Optimizing VM Storage Performance

There are many methods for optimizing the performance of VM storage:

Storage Caching

Enable read/write caching on disk arrays and controllers. This provides faster access to frequently accessed data. Ensure batteries or flash are available to flush cache to persistent storage in event of power loss.

Storage Tiering

Set policies to automatically move data between tiers (SSD, SAS, SATA, cloud) based on access patterns over time. This provides optimal cost and performance.

Thin Provisioning

Allow over-allocation of storage capacity, while drawing actual capacity from a shared pool on demand. This prevents pre-allocation waste.

I/O Scheduling

Use “deadline” based disk scheduling to optimize reads & writes and prevent resource starvation. Set appropriate queue depths on controllers.

VM Placement

Place high I/O workloads on hosts with local SSD or high-speed shared storage connectivity. Use lower cost storage for archival VMs.

Resource Monitoring

Continuously monitor and analyze storage performance metrics under different workloads to identify and correct bottlenecks (IOPS, latency, queue depth).

Securing and Protecting VM Data

Proper processes must safeguard VM data against compromise or loss:

  • Classify data value, sensitivity, and required availability for protection levels.
  • Implement least-privilege permissions restricting unauthorized access.
  • Encrypt data-at-rest and data-in-motion over networks. Maintain strict key management.
  • Document failover and disaster recovery procedures to restore service and data access after disruptions.
  • Take regular backups with immutable snapshots to guard against ransomware. Test restoration to validate recoverability.
  • For highly sensitive data, implement auditable logging, version histories and watermarking to track improper alterations or access.
  • Assess and harden environments against known attack vectors like hypervisor compromises, network intrusions, improper storage access, unauthorized insiders, malware and human errors.

Conclusion and Key Takeaways

There are two primary locations for VM data storage – local and shared network – with trade-offs in performance, availability, scalability and functionality. For production workloads, shared and redundant storage generally provides the right mix of these traits.

No matter the storage type, properly secure access, implement disaster recovery processes, and validate restores through regular backup and mock drills. Monitor storage resource usage and performance characteristics, and optimize configurations to provision adequate IOPS and capacity for the most economical price point.

As virtualization usage grows with cloud computing, software defined storage and hyperconverged platforms provide increasingly advanced data services while simplifying deployment and management.

Frequently Asked Questions

  1. Where are VM files stored locally?
    Locally, VM files like VMDK/VHD files storing disk data are often stored on dedicated VM host local storage or datastores. These provide low latency access but lack redundancy.

  2. What is a VM datastore?
    A VM datastore is a logical container, usually connected via shared storage, where VM files get stored, managed, and accessed. Datastores appear to VMs as local storage.

  3. What is vSAN?
    VMware’s vSAN provides a distributed, scale-out virtual SAN by pooling local SSD and disk capacity across a cluster of hosts. This shared capacity is then allocated to VMs as virtual datastores.

  4. What is VM thin provisioning?
    Thin provisioning allows over-allocation of virtual disk capacity to VMs, while drawing actual storage on-demand from a pool. This prevents wasted pre-allocated storage.

  5. What are VM snapshots?
    VM snapshots capture the state of a VM’s disk at a point in time, stored separately for quick restoration. Snapshots only capture changed data to conserve space.

  6. How can VM performance be optimized?
    Key ways to optimize VM storage performance include SSD utilization, sufficient IOPS allocation, storage tiering across drive types, partition alignment, caching, thin provisioning, I/O scheduling priorities and resource monitoring.

  7. How can VM data be secured?
    Securing VM data involves restricting unauthorized access via permissions, encryption and network isolation. It also requires guarding against data loss through backup/replication, snapshots, auditing, version histories and ransomware-specific protections.

  8. Where does VM data get stored in the public cloud?
    In AWS, VM data gets stored as EBS (Elastic Block Store) volumes, or S3 (Simple Storage Service) buckets for object storage. These attach over the cloud network rather than locally.

  9. Can VMs use NAS or SMB file shares?
    Yes, VM farms can use dedicated NAS (network attached storage) devices and SMB/CIFS-based file shares for VM storage via NFS or SMB protocols. These consolidate file storage/access.

  10. What are SAN LUNs?
    A SAN (storage area network) provides consolidated block storage accessed over the network. LUNs (logical unit numbers) provision isolated storage volumes served across the SAN to compute hosts like VM servers.

  11. What is a VM cluster?
    A VM cluster interconnects multiple hosts to pool and share resources. Shared storage connects over the cluster so if one host fails, VMs can quickly migrate to another host and access their storage.

  12. How is a VM connected to networks?
    VMs connect to networks via virtual network interfaces and switches configured by the hypervisor and passed through to guest VMs. This enables isolated, virtualized networks.

  13. What are HCI and hyperconverged platforms?
    Hyperconverged infrastructure (HCI) combines storage, compute, networking and virtualization in an integrated appliance managed under a unified interface to streamline deployment of VM clusters.

  14. What storage protocols do VMs use?
    Common protocols include iSCSI, FCoE or Fibre Channel for block storage; NFS or CIFS/SMB for file storage, and S3 or Azure Blob for cloud object storage.

  15. What are VM migration and replication?
    For high availability across a cluster, VMs can migrate across hosts or replicate to secondary nodes. Shared storage enables quick migration as the VM storage moves with the workload.

  16. How can VM sprawl be controlled?
    Carefully manage VM provisioning & decommissioning, set policies around VM allocation approval and configure resource limits per VM/owner. Chargebacks also incentivize proper allocation.

  17. What is software-defined storage (SDS)?
    SDS abstracts storage hardware into pooled, scalable resources that can provision VM storage on-demand. This simplifies management while better utilizing existing capacity.

  18. What storage protocols do VMs use?
    Common protocols include iSCSI, FCoE or Fibre Channel for block storage; NFS or CIFS/SMB for file storage, and S3 or Azure Blob for cloud object storage.

  19. What are the most common VM disk formats?
    Most common VM disk formats include VMDK (VMware) and VHD/VHDX (Hyper-V). QCOW2 (KVM) and RAW formats are also found for increased portability between hypervisor platforms.

  20. How can data be protected across multiple data centers?
    Synchronous or asynchronous storage replication mirrors data between sites in active-active or active-passive modes. For greatest protection, backup VMs to isolated media like tape held offline.

Conclusion

Storing and safeguarding the data within VMs should be top priority to deliver performance, protection and compliance. Evaluate the options to meet service levels agreements (SLAs) at the optimal price point while mitigating risks like outages, restoration failures, unauthorized access or unplanned sprawl.

Leave a Comment