Nearly 18-months ago, we looked at the College of Oceanic and Atmospheric Sciences (COAS) at Oregon State University (OSU), which has been a Sun shop since 1984. This week, we take another look at how this institution uses a variety of Sun Fire servers for weather modeling, global climate simulations, and forest fire research. The current data center features Sun UltraSPARC IIII and IV+ powered servers as well as brand new T2000 servers operating in a compute-cluster architecture.
"Sun lets us marry a wide range of tasks to specific hardware," says Chuck Sears, director of research computing at COAS. "And we are doing it at a cost-effective and energy efficient way."
Based in Corvallis, Ore., COAS is one of 11 graduate research colleges at Oregon State University and is one of the best oceanographic and atmospheric science institutions in the nation.
"We study the Earth system as a whole," says Sears. "That's why we need to create a technical ecosystem that allows us to deal with complex interactions."
Sears expresses frustration with the computer industry's trend of the past decade toward a one-size-fits-all paradigm. In the past, some machines were suited to transactional work, and others were better at computational tasks.
He believes the industry has tried to combine both on one box and in doing so it has sacrificed both. This came about largely because of the race to accelerate clock speeds. As a result, COAS has had trouble matching its tool sets to the various workloads.
"We were trying to tell a story by building a technical environment that mimics real-time events," says Sears. "That meant mapping our processes onto systems that lacked granularity."
The old solution to this problem was to set up dedicated silos, each engineered for a specific workflow. This led to a fragmented heterogeneous landscape consisting of servers from Sun, IBM, SGI, and Dell running Solaris, Linux, and Windows. About 500 users access the systems, and the network harnesses a 10 Gb backbone. About two years ago, COAS introduced 20 fully loaded Sun Fire 440 UltraSPARC III servers. These 4-processor units (1.06 GHz) had 8 GB of memory and took over the core of the research unit's operations.
To gain better control of workflows, COAS added several new server lines to the mix. To increase throughput, for example, the IT staff had a simple choice: Either add many more UltraSPARC III boxes and at least double the required floor space, or switch to the next-generation Sun box and reduce floor space.
"A rack of our UltraSPARC III based systems have now been reduced to a single file cabinet, with no recoding or recompiling needed to upgrade to Sun UltraSPARC IV+ servers," says Sears. "We're getting faster performance with the UltraSPARC IV+ Sun Fire V890 server over our previous UltraSPARC III system that had twice as many processors."
COAS has chosen the early adopter route on one UltraSPARC IV+ Sun Fire V890 server to date. It is an 8-way server with 32 GB of memory. According to Sears, it took slightly more than three hours to move the existing code bases from the UltraSPARC III server to the UltraSPARC IV+ server and get it up and running. COAS can also take advantage of the improved throughput and greater threading capabilities of the new servers without having to change code. He estimates this saved the data center from having to engage in three to five human years of code conversion on its core applications and simulations.
Next year, the research department plans to replace its UltraSparc III resources with IV+ servers as well as additional Sun Fire T2000 models. Each T2000 can replace as many as four Sun Fire 440s, while cutting energy consumption by half and improving throughput two- and even three-fold.
The department has one T2000 in-house and another two on order. These multicore servers have one processor with eight cores. Each core, in turn, has four threads, for a total of 32 threads per processor. The Sun Fire T2000 was tested with a series of in-house decision support applications, including analytic models, enabling researchers to mine data to identify and respond to problems quickly, and to make decisions in near real-time.
"Early indications with a very small amount of performance enhancements to our code bases are yielding three to four times increases over some other of our systems," says Sears. "Processing time for the workflow has been reduced from about 22 minutes to five minutes. In some cases we are seeing 12 to 15 times improvement over previous SPARC based workflows."
The OSU department has also been running Sun boxes with AMD processors. About a year ago, this began with single-core Sun Fire V20z servers. These are 1U dual-processor machines powered by an AMD Opteron 64-bit processor. They run either Solaris or Linux, depending on the applications. Some have 2 GB of memory, and others have 4 GB. The machines are used for tasks requiring intensive floating point computations, and COAS now has 60 of these servers operating in its clusters.
More recently, the research group added a Sun Fire 4200 server. Four more are on order. This is a dual-core Opteron model with 8 GB of memory.
"We are moving gradually to these servers as they are dual core, have more memory, and offer better computational capabilities," says Sears. "The 4200 is better than the V20z in terms of design, serviceability and ease of use."
Mix and Match
With several Sun server lines in-house, COAS is able to mix and match its workflows to the most appropriate servers. While AMD is favored for high-end computational clustering workloads, such as running weather simulations, the UltraSPARC IV+ and T2000 servers are preferred for transactional tasks.
"Now we can match our workflows onto the right machine," says Sears. "High throughput multithreaded work goes to [the] T series as it is excellent at this kind of task."
By using Web services to link the various clusters and servers together into one virtual system, researchers no longer need to move location to accomplish a specific kind of task or request specific resources from the IT department. Nor must researchers wait for coding to be done to run new and more complex tasks. They just run the applications regardless and receive the results much more rapidly than before.
"We realized that different workflows require different architectures," says Sears. "Instead of silos, we have built an ecosystem that flows data to the right place, and we no longer have to worry about our coders having problems."