Hardware Today: Dude, You've Got a Dell Cluster

Monday Apr 18th 2005 by Drew Robb

The University of Tennessee at Chattanooga has harnessed the power of Dell PowerEdge Servers for its supercomputing cluster.

The University of Tennessee at Chattanooga (UTC) recently completed the second phase of its three-part cluster strategy, with the installation of 398 Dell PowerEdge 1850 servers running Linux. Housed at UTC's Computational Simulation and Design Center (SimCenter), the Dell cluster is used primarily for computational field simulation, including solid and fluid mechanics, electromagnetics, energy and mass transport, chemical reactions, and materials science.

"The Dell cluster enables us to do calculations faster and more accurately than we could before, and it offers the price for performance we need," says Wally Edmondson, systems administrator at UTC's SimCenter.

The UTC SimCenter's mission is to serve U.S. government and industry through integrated research and education in computational engineering. A multidisciplinary team of teaching and research faculty, research professionals, and students pursues this mission. They develop advanced computational simulation and design systems that enable and support designers in the analysis, design, and certification of air, land, sea, and space systems.

UTC receives contracts from governmental and private organizations to perform computational research on its systems. The SimCenter was founded two years ago. It initially deployed a cluster of 16 IBM xSeries 335 machines, each with a 2.8 GHz Xeon processor. Six of the xSeries systems remain clustered and are still used in UTC simulations. The remainder have been converted to file servers. SimCenter then added 32 3.2 GHZ Xeon-based Intel white boxes in another cluster. But neither could cope with the traffic or provide the desired computational performance.

The university subsequently engaged in a lengthy and sophisticated selection process for its next cluster. It looked at products from 14 vendors, all which provided a server for UTC to score based on ease-of-install, management capabilities, and price/performance. The university tested both Opteron- and Xeon-based models. Price/performance was weighted higher than the other categories at 75 percent.

"We tested each server on our configuration to see how long they took to solve various problems. The Dell server scored best, especially in price/performance." — Wally Edmondson, systems administrator at UTC SimCenter

"We tested each server on our configuration to see how long they took to solve various problems," says Edmondson. "The Dell server scored best, especially in price/performance."

The dual-processor PowerEdge is designed for high-performance computing clusters, SAN front-end systems, Web and infrastructure applications, especially where data center real estate is at a premium. It uses dual Xeon 3.2 GHz processors with EM64T support, DDR-2 Memory, 1 MB of L2 Cache, and PCI Express. The rack-dense 1U form factor can hold up to 42 servers in less than seven square feet of data center floor space. It can run both 32-bit and 64-bit applications.

The first phase of the implementation, completed in September 2004, consisted of 137 servers. All but one were configured in a compute cluster, and that server is used for management and administration. According to Edmondson, there is no particular significance to the 137 number. It was simply as far as the budget stretched at the time. The SimCenter planned to add more systems as soon as possible.

When more funds became available, the cluster entered Phase 2 and expanded to 396 nodes. Why stop there? This time, it was less a matter of funds and more a matter of power/cooling infrastructure.

"As we weren't sure how many servers our AC [air conditioner] could take, we decided to stop below 400 nodes to check if everything had stayed within our thermal limits," says Edmondson. "After using them for months, our cooling levels seem fine and the AC system appears to be coping well."

Linux Cluster

UTC has always been a Linux shop. It gravitated toward open source software for cost reason. With its small software budget, Linux has proven to be a perfect fit.

"Unix has large fees for support, and Windows just is not ready for clustering yet," says Edmondson. "Linux, on the other hand, is stable, easy to manage, and all the software we run is Linux. And we had the internal expertise to run Linux on own."

"The main reason for the change is SUSE's support for 64-bit computing," says Edmondson. "We estimate that the move to 64 bit will give us a 20 percent to 30 percent speed up in performance."

On the surface, the choice of open source might appear something of a gamble. Edmondson, after all, is the only official IS staff member in the entire organization. Several of the research professors, he says, pitch in with programming assistance. But everything else is largely up to him. Fortunately, it was worked out well for the organization.

UTC currently uses Red Hat 8 for servers and SUSE 9.2 Professional for desktops. SimCenter will soon be switching to SUSE 9.2 on its server, as well.

"The main reason for the change is SUSE's support for 64-bit computing," says Edmondson. "We estimate that the move to 64 bit will give us a 20 percent to 30 percent speed up in performance."

All the software used in the clustering environment is Linux-based. The basic Simulation applications, for example, are homegrown. But the university is using several third-party software tools. For visualization of results, it uses FieldView from Intelligent Light; to generate the grids used by analysis software, such as computational fluid dynamics, it uses Gridgen by PointWise.

On the storage side, the organization uses a distributed global file system by Ibrix. Data Direct Networks provides the hardware. All is purchased through Dell. This provides a storage capacity of 20.8 TB, split into 96 drives within six enclosures. Serial ATA disks are used, but Fibre Channel improves performance. And ADIC Scalar 1000 is present for tape backup purposes. To add power to its Gb Ethernet network, UTC has deployed a switch by Force 10 Networks.

Future Racks

After several months of operation, SimCenter users are more than happy with the performance of the new supercomputing platform. The old clusters, which are still in use within the center, couldn't run the size of the jobs it currently is.

"Our old 32-node cluster could perform a maximum of a 19 million point simulation," says Edmondson. "Our new Dell cluster has a potential limit 235 million point simulation."

UTC is awaiting support from one software vendor for 64-bit before it implements a 64-bit cluster of almost 400 nodes. In the meantime, it has three racks running a 64-bit cluster in a test environment.

The SimCenter eventually plans to add more PowerEdge 1850s but has no concrete plans as far as how many or when. Edmondson expects to add two more racks, perhaps 72 additional servers. This is largely hypothetical right now, however, as it depends on whether the center needs the extra processors.

"We are not in it for bragging rights, as the cluster is only a tool," said Edmondson. "As the prices are always dropping, it will be cheaper in the longer run to wait a while. But if we really need a larger cluster, we'll buy it."

Mobile Site | Full Site