More on Windows Server 2008 R2
Since the inception of Windows NT, Microsoft has been pursuing the goal of extending its reach from personal computing to enterprise markets. One of the important elements of this strategy was a strive toward high availability, leading to the development of server clustering technology. In recent years, Windows' capabilities have been extended to incorporate a virtualization platform, gaining extra momentum following the release of Windows Server 2008 and its Hyper-V component.
That momentum, however, was somewhat hampered by unfavorable comparisons with products from competing vendors.
The most commonly noted shortcoming was the inability to failover virtual guests without incurring downtime as is achievable with VMware's VMotion. This gap was subsequently eliminated with the introduction of Live Migration and Cluster Shared Volumes (CSV) in Windows Server 2008 R2.
Live Migration is a new feature incorporated into clustered implementation of Windows Server 2008 R2-based Hyper-V, which makes it possible to move guest operating systems between cluster nodes without noticeable downtime, which parallels the functionality of VMware's VMotion. Effectively, virtual machines (VMs) remain accessible to external clients and applications throughout the entire migration process, although their hosts change. This constitutes a significant improvement over Quick Migration available in Windows Server 2008 based clustered Hyper-V implementations where a similar process resulted in a temporary downtime.
The "live"aspect of the migration is accomplished through a procedure that copies iterative memory pages (referred to as working set) used by the VM being migrated over a dedicated Live Migration network from a source to a target Hyper-V host. This is repeated several times for any pages that changed during the preceding iteration to minimize working set differences between two VM memory-resident instances.
The final iteration includes a status of registers and virtualized devices, followed by handles to the VM's storage (such as VHD files or pass-through disks). Once the transfer of all resources is completed, the VM is momentarily paused, and remaining pages are copied to the target. At that point, the new VM instance is brought online on the target host. References to it are removed from the source. Finally, RARP packets are sent by the migrated VM to ensure switches are informed about new ports associated with its IP address. A momentary downtime is not noticeable as long as the final steps of this sequence do not exceed the span of a TCP session timeout. Their duration is dependent primarily on the available bandwidth of the Live Migration network and are a reflection of how active the migrated VM is.
It is important to point out that neither Live Migration nor VMotion in any way remediate outages caused by a failure of a host where VMs reside. In such cases, guest operating systems remain inaccessible until their automatic restart is completed on another cluster node. However, this is expected and should not diminish the appreciation of benefits delivered by both of these technologies. Most importantly, Live Migration practically eliminates the need for a maintenance window of Hyper-V hosts. It also facilitates the concept of dynamic data centers, where virtual resources are relocated between hosts to optimize their use. It is possible to automate this process by leveraging VM provisioning and intelligent placement provided by Microsoft System Center Virtual Machine Manager 2008 R2.
Cluster Shared Volumes
Cluster Shared Volumes (CSVs) was designed to allow shared access to the same LUN (an acronym derived from the term Logical Unit Number, which, in Windows parlance, corresponds to a disk mounted on the local host without regard for its physical structure) by multiple cluster nodes. This represents a drastic departure from the traditional "share-nothing" Microsoft clustering model, where only a single host was permitted to carry out block I/O operations against a given disk.
Interestingly, this achievement was made possible without reliance on clustered file system (implemented by VMware and available on Windows platform with assistance from third-party products, such as Sanbolic's Melio FS). Instead, CSV works with any shared NTFS formatted volume -- so long as the underlying hardware and software components comply with Windows Server 2008 R2 based Failover Clustering requirements.
To prevent disk corruption resulting from having multiple nodes accessing the same LUN, one of them (referred to as Coordinator and implemented as the CSVFilter.sys file system mini-filter driver) arbitrates I/O requests targeting individual VMs. It provides addresses of disk areas to which owners of these VMs are permitted to write directly. At the same time, the Coordinator node is solely responsible for locking semantics and carrying out changes affecting file system metadata (such as creating and deleting individual files or modifying their attributes).
Since CSVs contain a small number of files, in general such activities are relatively rare and constitute a small portion of overall disk activity. In some cases, however, you might want to initiate certain I/O-intensive operations not suitable for direct access (e.g., Server Message Block-based file copies, chkdsk, defragmentation or host-based backups) from the Coordinator node to maximize their speed. To determine which host functions as the Coordinator node, identify the owner of the Physical Disk clustered resource corresponding to the LUN where the CSV is located. Incidentally, this architectural design gives you an additional level of resiliency, maintaining the availability of CSV-hosted VMs even if the connectivity to the underlying storage from their host is lost. At that point, direct I/O traffic is automatically rerouted via the Coordinator node. In such cases, however, performance is likely to suffer due to the overhead of SMB communication between the nodes.
Despite this rather common misconception, CSVs are not required for Live Migration to function. It is possible to use this feature with VMs hosted on any Physical Disk resource in a Windows Server 2008 R2-based Hyper-V cluster. However, it is highly recommended to combine benefits provided by each of these technologies. This way, you are able not only to perform independent failover of VMs stored on the same LUN but also to minimize the timeout during the final stage of the migration. This is critical from a high-availability standpoint since the use of CSV eliminates delay associated with changing disk ownership that takes place in a traditional failover scenario.
Implementing Live Migration
To implement Live Migration, you must satisfy the following requirements:
- Install a multi-node failover cluster consisting of between 2 and 16 nodes running either Windows Server 2008 R2 Enterprise, Windows Server 2008 R2 Datacenter or Microsoft Hyper-V Server 2008 R2. Although OS editions do not have to match, it is not possible to mix full and core instances. Microsoft Hyper-V Server 2008 R2 is considered to be the latter.
- Configure iSCSI or Fibre Channel storage shared by all nodes.
- Separate a network subnet shared by all nodes dedicated to Live Migration, with the bandwidth of 1Gbps or higher, preferably via adapters (both physical and VM-based) configured with support for Jumbo Frames, TCP Chimney and Virtual Machine Queue. These capabilities have been introduced in Windows Server 2008 R2 based Hyper-V. Effectively, each cluster node should have at least five network adapters to accommodate private, public, Hyper-V management and redirected CSV I/O traffic. This number increases if you intend to use iSCSI-based storage and want to provide some level of redundancy. In addition, Client for Microsoft Networks and File and Printer Sharing for Microsoft Networks components should be enabled on adapters connected to the CSV network. To designate, assign to it the lowest value of Metric parameter using Get-ClusterNetwork PowerShell cmdlet. Conversely, disable these components on adapters intended for Live Migration. This is configurable via the Network for Live Migration tab of VM's Properties dialog box in the Failover Cluster Manager interface.
- All nodes must either have the matching processor type or use Processor Compatibility Mode (available starting with Windows Server 2008 R2) to disable processor features not supported cluster-wide. Despite the additional flexibility this feature provides, the basic requirement for consistent processor architecture still must be satisfied (i.e., you cannot mix servers running AMD and Intel processors). Keep in mind that Processor Compatibility Mode might cause some applications to work in a substandard manner or even fail.
CSV takes the form of the
%SystemDrive%ClusterStorage folder appearing on all cluster nodes. Each new LUN added to it is represented by a subfolder, named by default Volumex (where x is a positive, sequentially assigned integer). When creating VMs using CSV functionality, all of their components, including configuration files and VHDs corresponding to dynamically expanding, fixed-sized or differencing volumes (CSV does not support pass-through disks) must reside within one of these volumes. When configuring new VMs, this happens automatically as long as you point to the appropriate Volumex subfolder of the
%SystemDrive%ClusterStorage folder on the Specify Name and Location and Connect Virtual Hard Disk pages of the New Virtual Machine Wizard. Subsequently, to provide high availability capabilities for the newly created VMs, they must be added as VMs using the High Availability Wizard accessible via Configure a Service or Application link in the Failover Cluster Manager console. Alternatively, can combine both of these steps by using Virtual Machines... New Virtual Machine links in the context sensitive menu of Services and applications node.
Once implemented, Live Migration can be initiated manually via the management interface of Failover Cluster Manager and Microsoft System Center Virtual Machine Manager 2008 R2 or through PowerShell cmdlets (on which the tasks carried out by SCVMM 2008 R2 are based) and corresponding Windows Management Instrumentation scripts. It is also possible to automate its execution by leveraging PRO Tip functionality of SCVMM 2008 R2 or trigger it whenever a cluster node is placed in the maintenance mode.
Just remember that any cluster node supports only a single Live Migration session (incoming or outgoing) at the time. Effectively, this means the total number of simultaneous migrations is limited by the number of cluster nodes (equal to 1/2 of their total count). It is also worth noting that CSV technology introduces operational challenges, backup in particular, which should be carefully considered before you decide to implement it.