A virtual machine (VM) is a software application that emulates a physical computing environment and works like a separate computer system. VMs provide functionality, software execution, and isolation comparable to a dedicated physical computer, without requiring separate dedicated hardware.
How data is handled on virtual machines
When a VM is set up, a storage location is configured for it to save data, typically in the form of virtual hard disks (VHDs). The VM accesses these VHDs in the same way as physical hard drives. There are a few key ways data can be saved with a VM:
- Non-persistent VMs: Any data saved on the VM is not preserved when the VM is powered off. The VHDs revert to their original blank state, deleting all changes made within the VM session. These types of VMs are good for temporary tasks, isolating untrusted apps, and testing/development.
- Persistent VMs: Changes made within the VM are saved to the configured VHDs so data is preserved between sessions. This allows the VM to function similarly to a physical computer, making persistent VMs good options for running production environments virtually. The VHDs expand dynamically as more data is saved.
- VM snapshots: These save the state of a VM at a specific point in time, preserving the OS, apps, and data inside it. Snapshots enable rolling back to undo changes or recover from issues by restoring to a previous working snapshot. Multiple snapshots can be created over time to track a VM’s state.
So in summary, while non-persistent VMs do not retain data, persistent VMs and snapshots give users control over saving data with virtual environments for long-term use if desired. The separation between a VM and host environment also provides strong isolation and protection of data from the underlying infrastructure.
Where and how is data stored?
Since a VM emulates physical hardware virtually, data is stored in the same ways it would be stored on a physical computer. Virtual storage is provisioned from an underlying host system where the VM hypervisor runs. The common virtual data storage locations include:
- Virtual hard disks (VHDs): Virtualized equivalents of physical HDDs allocated to each VM. VHD formats include:
- Dynamically expanding to grow capacity as needed.
- Fixed size to create a VHD of defined capacity.
- Differing disk – changes written to a different file for easy transition between VMs.
- Passthrough disks – assign a physical disk directly to a VM.
- Virtual CD/DVD drives: Usage and capabilities mirror their physical counterparts and can link VHDs or physical media to VMs.
- VM snapshots: Captured state of the VM and its disks at a moment in time, saved within the VM configuration. Snapshots utilize the standard VHDs allocated to the VM.
- Virtual SAN: Software that creates pooled, shared storage from multiple servers’ internal disks and SSDs to use with VMs vs individual VHDs. Enables shared access with features like thin provisioning, deduplication, snapshots, cloning, encryption, and caching.
- Shared storage: VMs can leverage storage area network devices and network-attached storage by mapping their VHDs to logical unit numbers (LUNs) that represent storage allocation from the shared arrays. Enables benefits like easier VM migration between host servers.
So in summary, VMs follow very similar storage schemes as physical computers by utilizing virtual equivalents of locally attached disks backed by VHDs, optical media, networked storage resources made available via the hypervisor platform and host OS. This provides flexible options to store data securely within the isolated VM environment.
Typical virtual machine data storage capacity
The total storage capacity available for saving data on VMs can vary substantially based on several factors:
- Hypervisor and infrastructure limits: The hypervisor managing the VMs, and the host system infrastructure supporting it, place certain contextual bounds around maximum capacity per VM. This can range from a few hundred GBs to several TBs per VM volume.
- Allocation model: Capacity limits differ if VMs use fixed allocation vs dynamic expansion of underlying VHDs. Fixed VHDs have a defined capacity while dynamically expanding ones can grow on demand.
- Role and usage: How a VM is being used can impact typical capacity. A development VM may only need 64GB while production application servers require more. An enterprise database VM could utilize multi-TB capacity.
- Number of VMs: Total shared capacity gets divided across all provisioned VMs. So while per-VM limits may be ample, total capacity available in the environment also goverms.
- Resource contention: Workloads and storage types create different physical resource demands. Mixing high I/O, memory or CPU workloads together limits individual VM capacity compared to less demanding applications.
In most VM environments, anywhere from 50GB to a few TBs per VM would be fairly common storage capacity configurations. Enterprise and intensive workloads allow upwards of 10-50TB+ per VM in some virtualized platforms too. So VMs can emulate both modest disks traditionally seen in physical servers as well as very large direct-attached and shared storage pools comparable to high-end servers.
How does virtual machine storage performance compare to physical servers?
Storage performance is a critical metric since disk I/O impacts application experience regardless of being on virtual or physical servers. VMs can provide performance at parity with their physical counterparts but do bring additional factors that govern speed:
- Overhead: Extra processes like emulating virtual hardware and managing VHDs/snapshots incurs some percentage of performance overhead. This typically ranges from negligible levels to around 5-15% compared to the equivalent physical server in modern hypervisors.
- Hypervisor caching/queueing: Intelligence built into the hypervisor platform improves the efficiency of I/O activity between guest VMs and underlying storage by techniques like asynchronous I/O with caching, queue optimization, and request distribution. This coordination by the hypervisor enhances aggregate performance across VMs sharing the same storage subsystem compared to uncoordinated physical servers contending on that storage.
- Hyperconverged infrastructure (HCI): Integrates storage physically within host servers and leverages techniques like VM-centric policies, flash caching, erasure coding data protection, and QoS via the hypervisor. This tight coupling often outperforms both traditional physical servers and virtual environments at large scale by coordinating resources.
- Vulnerable to resource contention: Since multiple VMs share the same physical host infrastructure, conflicting demands on resources like CPU, memory, network, and storage can degrade performance if not balanced correctly. This vulnerability can be avoided through intentional allocation aligned to physical capacity during VM deployment.
So while VM storage mechanics add some nominal overhead, performance optimization is very realistic through hypervisor intelligence along with deploying VMs strategically across shared resources. For the majority of workloads, equivalent or better storage speed compared to physical x86 servers is achievable with proper VM platform selection and resource allocation.
- Non-persistent VMs delete all data on shutdown while persistent VMs and snapshots retain data in configured VHDs for reuse across sessions.
- Virtual storage leverages virtual hard disks, optical media, VM snapshots, virtual SANs, and shared storage to emulate physical computer data storage mechanics.
- Per VM limits range from 100s of GBs to 10s of TBs with total capacity governed by hypervisor and infrastructure capabilities along with application demands.
- Hypervisors mitigate most I/O overhead via optimization techniques so well architected virtual environments match and even outperform traditional physical servers.
In summary, virtual machines can certainly save data securely via persistent VHDs and snapshots to persist across shutdowns – contrasting non-persistent VMs that clear all changes. This facilitates reusable VMs with saved application state, configs and content. Underlying hypervisor platforms orchestrate significant storage capacity for multi-TB per VM through a variety of virtual data stores. Optimization overcomes the nominal I/O overhead incurred, delivering performance matching and exceeding traditional physical servers in virtual environments architected correctly. So virtual infrastructure unlocks powerful server consolidation and data protection capabilities while empowering workload portability and availability along with favorable TCO – all without compromising security or storage service levels compared to physical servers.
Frequently Asked Questions
Q: Is a virtual machine temporary storage?
A: Virtual machines can utilize temporary/non-persistent storage that clears on shutdown. But they more commonly leverage persistent virtual hard disks to save data securely across sessions, matching physical computer storage longevity.
Q: Can you lose data on a VM?
A: Yes, data loss can occur due to deleting or corrupting files mistakenly within the VM, just like a physical computer. Additionally power failures, host server crashes, underlying storage failures, or user errors in managing VM snapshots can lead to VM data loss if adequate protections were not implemented.
Q: Where are files stored on a VM?
A: Files within a VM are stored in virtual hard disks (VHDs) that emulate physical disk storage. The actual VHD content resides on storage infrastructure accessible to the hypervisor server hosting that VM, typically centralized SAN or NAS systems or distributed storage present locally on hypervisor host systems.
Q: Why is my VM storage full?
A: Common reasons include insufficient initial storage allocation, dynamically expanding disks filling up, log file accumulation, page file expansion, new software installation, copying data into VM instances without expansion planning, inadequate VM backups triggering capacity limits, and retention policies missing to identify stale, unnecessary data for clean up and freeing of space.
Q: How do I add more storage to a VM?
A: Typical methods include increasing allocated VHD capacity, adding additional VHDs, upgrading to larger supported virtual disk types, leveraging thin provisioning for overallocation ability, attaching new virtual or physical disks/LUNs directly if supported, or deploying a new VM with larger initial storage provisioning then migrating data over.
Q: Is VM storage encrypted?
A: Encryption capabilities for VM storage depend on several factors: the guest OS itself supporting encryption natively at the file/disk level just like physical computers, the hypervisor platform providing means to encrypt VHD content selectively, and the underlying physical storage platform powering encryption as data gets stored onto disk.
Q: How do I backup VM data?
A: Common ways to backup VM data include running backup agents within VMs to capture application data and files into a backup catalog, using VM snapshots to record VM state periodically for rollback and restore needs, leveraging hypervisor VADP integration to offload backups to media server, backing up full VHD files created for each VM directly, and replicating VM content fully to a disaster recovery site.
Q: Is VM storage shared with the host computer?
A: No, VM storage layer appears independent to the guest VM itself and has no direct intersection with host computer storage used by hypervisor server natively since VMs run in a hardware-abstracted containment. But underneath both leverage shared physical storage infrastructure.
Q: Can I use physical hard drives for virtual machine storage?
A: Yes, options do exist to assign raw physical storage disks directly to VMs via disk passthrough or PCIe pass through of storage controllers to map their attached capacity into the VM. This provides native performance without VHD translation abstraction.
Q: How do VMs access a physical disk?
A: Main ways involve the hypervisor mapping a LUN from a SAN/NAS as a virtual disk into a VM, PCI passthrough of an entire physical disk controller to give VMs raw access to attached disks, and passthrough of individual physical disks to assign whole disks to VMs directly.
Q: Can a VM only access a portion of a physical disk?
A: Potentially – by carving out a partition on a physical disk to map as a full virtual disk into VMs, though this limits portability. More universally, a LUN mapped from SAN/NAS to VMs or a VHD backed by a file residing on shared storage can provide a portion without requiring full disks to be dedicated.
Q: How large can a VM disk size be?
A: VM virtual disk maximum sizes depend on hypervisor and host operating system capabilities given storage topology – ranging from 2TB to 64TB for VHD/VMDK formats. But some hypervisors also support physical passthrough of much larger disks and dynamically expanding to petabyte scale with certain clustered file system storage backends.
Q: Can I attach an external hard drive to a VM?
A: Access to external portable USB hard drives generally won’t work directly from inside VMs due to lacking hardware access. But features like USB passthrough could assign an external drive to a VM if the hypervisor supports it. Network attached storage mapped as virtual disk would be more common way to expand VM externally.
Q: Do virtual machines write to disk?
A: Yes absolutely – the virtual hard disks (VHD/VMDK) assigned to VMs do get written to frequently as applications execute, OSes swap/page files, logs/temp data get stored, and users save files just like physical computers reading and writing to directly attached storage disks. The actual underlying storage infrastructure accumulates all the writes.
Q: Should VM storage match production?
A: Matching VM storage performance, capacity and resilience to the production environment planned for deployment is an important guideline for yielding the most accurate testing results and workload validation. But this also requires excess resource overhead to replicate fully – so finding the right balance is key.
Q: Can I migrate VM storage?
A: Key capabilities like vMotion and XenMotion allow live migration of VMs between hosts, including movement of the storage by migrating underlying VHDs or automatically integrating with shared storage on destination host. This seamless movement helps VM mobility transparently between compatible underlying infrastructure.
Q: What’s the difference between thick and thin provisioning VM storage?
A: Thick provisioning fully allocates a fixed VHD capacity upfront while thin provisioning allocates storage dynamically on demand as written avoiding overprovisioning. This helps efficiency in utilization – but thick provides predictable performance while thin risks slower response if underlying capacity is overcommitted.