Thus far, our series about Windows 2003 Server-based high availability solutions, has provided a brief overview of server clustering, and described its three basic types, categorized according to characteristics of the Quorum resource (i.e., Single Shared, Single Local, and Majority Node Set). Before we delve deeper into discussion about design and implementation issues, we will examine one of the most important of clustering components disk storage. The next few articles will review cluster-specific requirements in this area and explore a number of technologies that satisfy them.
The importance of disk storage stems from its role in the server clustering architecture. As you might recall from earlier discussions, the Quorum resource must be implemented as an NTFS volume, hosting Quorum log and Checkpoint files (for more details, refer to our earlier article). Just as relevant is the ability to implement the Physical Disk resource (separate from the Quorum resource), which is required in overwhelming majority of typical clustered applications.
To comply with server clustering principles, storage must have certain characteristics. More specifically, the volumes it hosts must be accessible to all cluster nodes, a critical requirement for the Single Shared cluster category is concerned. This applies to many deployments but not Single Local or Majority Node Set types.
Cluster Service Communication
Cluster Service Communication
The storage must also be able to communicate with the Cluster Service, an instance of which runs on every node, via SCSI protocol. This does not limit hardware choices to SCSI disks, channels, and controllers; however, disks and controllers must be capable of properly processing (and sharing a channel for proper transmitting) such SCSI commands as Reserve (used by individual cluster nodes to obtain and maintain exclusive ownership of a device), Release (which relinquish reservation of a device, allowing another cluster node to take ownership of it), and Reset (forcibly removing existing reservation of an individual device or all devices on the bus).
These commands serve a very important purpose they prevent a situation where two hosts would be permitted to write simultaneously to the same disk device. This is likely to happen otherwise, considering both hosts share a physical connection to it. When the first cluster node is brought online, its Cluster Service (with help of the Cluster Disk Driver Clusdisk.sys) scans the devices of the shared storage bus for attached to it and attempts to bring them online. It issues the Reserve command to claim ownership. The same command gets re-sent in subsequent, regular intervals (every three seconds). This ensures ownership is maintained, and the owning node has exclusive access to all volumes on the target disk.
Reserve also plays a critical role in cases when network communication between nodes fails. As mentioned in our previous article, such a situation is handled by establishing first which node is the owner of the Quorum (potentially triggering new election, if the previous owner is no longer operational) and transferring all cluster resources to it. A successful outcome of this process relies on two settings that control SCSI commands issued by cluster nodes. The first one forces the Quorum owner to renew its reservation every three seconds. The second one, inter-node communication failure, causes non-Quorum owners to initiate bus-wide Reset, followed by a seven second waiting period.
If the Quorum remains available after the wait period is over (which indicates the previous Quorum owner failed), the challenging node takes over ownership of the Quorum (by sending Reserve signal) as well as all remaining resources. Another purpose of Reset command is to periodically terminate reservations to detect situations in which a node becomes unresponsive (without failing completely). Providing that this is not the case, reservations are subsequently re-established.
Now that we have established functional requirements of storage in Single Shared Quorum clusters, let's review technologies that satisfy criteria outlined above. Regardless of your choice, the actual hardware selected must be Microsoft-certified, which can be verified by referencing Microsoft Windows Server catalog). In general, storage clustering solutions belong to one of four categories:
- Shared SCSI
- Fibre Channel Storage Area Networks (SANs)
- NAS (Network Attached Storage)
SCSI (an acronym derived from the term Small Computer System Interface) is the best known and the most popular storage technology for multidisk configurations. The term SCSI also refers to the communication protocol, providing reliable block-level data transport between a host (known as the initiator) and storage (known as the target), which is independent of the way data is stored. Its architecture consists of a parallel I/O bus shared between multiple (frequently daisy-chained) devices (including controllers), and enclosed on both ends with terminators, which prevent electrical signals from bouncing back (terminators are frequently built directly into SCSI devices).
A SCSI controller is typically installed in a host system as the host adapter, but it can also reside in an external storage subsystem. Each device on the bus is assigned an unique identifier referred to as SCSI ID that is numbered from 0 to 7 or from 0 to 15, for narrow and wide SCSI bus types, respectively. In addition to providing addressing capabilities, the SCSI ID determines priority level (with an ID 7 being the highest and assigned typically to the controller, ensuring proper bus arbitration).
A limited range of SCSI IDs (which the restrict number of devices on the bus to 15) is extended through the assignment of Logical Unit Numbers (LUNs), associated with each individual storage entity, which is able to process individual SCSI commands. Typically, they represent individual disks within a storage subsystem, connected to the main SCSI bus via an external SCSI controller. In addition to LUN and SCSI ID, the full address of such Logical Unit also contains a bus identifier, which commonly corresponds to a specific SCSI interface card. A server can have several such cards installed. The total number of available LUNs ranges from 8 to 254, depending on the hardware support for Large LUNs. For more information on this subject, refer to Microsoft Knowledge Base article 310072.
Implementing SCSI technology for the purpose of shared clustered storage adds an extra layer of complexity to its configuration. Since the bus must be accessible by clustered nodes, install a SCSI controller card in each (and disable their BIOS). Furthermore, since these controllers will be connected to the same bus, they cannot have identical SCSI IDs. Typically, this dilemma is resolved by setting one to 7 and the other to 6, which grants the latter the next highest priority level. To ensure the failure of a single component (such as a device, controller, or host) does not affect the entire cluster, use external (rather than device's built-in) terminators. Keep in mind that number of nodes in a SCSI storage-based clustered implementation can not exceed two.
As part of your design, you should ensure sufficient level of storage redundancy by implementing RAID, which enables individual disk failures to not affect overall data accessibility. Although Windows 2000 and 2003 Server products support software-based fault tolerant RAID configurations (RAID 1 and 5, known also as mirroring and striping with parity, respectively), this requires setting up target disks as dynamic, which in turn are not permitted at least not without installing third-party products (e.g., Symantec Storage Foundation for Windows add-in) as shared clustered storage This restriction does not apply to local cluster node drives.
Although this means that you must resort to more expensive, external storage arrays, which implement hardware-based RAID, you can benefit this way not only from significantly better performance but also from improved functionality, including such features as redundant hot swappable fans, power supplies, extra disk cache memory, and more complex and resilient RAID configurations (such as RAID 10 or 50, which combine disk mirroring with striping or striping with parity, protecting from losing data access even in cases of multiple disk failures).
Unfortunately, the SCSI technology, despite its relatively low cost, wide-spread popularity, and significant transfer speeds of up to 320 MBps with SCSI-3 Ultra320 standard is subject to several limitations. They result mainly from its parallel nature, which introduces a skew phenomenon (where individual signals sent in parallel arrive at a target at slightly different times), restricting the maximum length of the SCSI bus (in most implementations, remaining within 25 meters range, requiring physical proximity of clustered components, which makes them unsuitable for disaster recovery scenarios).
A recently introduced serial version of SCSI (Serial Attached SCSI or SAS) addresses the drawbacks of its parallel counterpart, but it is unlikely to become a meaningful competitor to Fibre Channel or iSCSI. The SCSI bus is also vulnerable to contention issues, where a device with higher priority dominates communication. Finally, storage is closely tied to the hosts, which increases the complexity of consolidation and expansion efforts.
Although shared SCSI technology is a viable option for lower-end server clustering implementations on the Windows 2003 Server platform, other types of storage solutions offer considerable advantages in terms of performance, scalability, and stability. The next article of this series will examine them in greater detail.