Japanese Supercomputer Claims to Be High-performance, Low-cost, Greenest

Jun 18, 2010
Tetsuo Nozawa, Nikkei Electronics
The press meeting
The press meeting
[ If it clicks, the expanded picture will open ]

The Tokyo Institute of Technology announced the details of the "Tsubame 2.0," the next-generation supercomputer system for the university that will start operation in the fall of 2010, at a press meeting.

The computation capacity of the system is 2.39 PFLOPS (petaflops, double-precision value), which ranks second in the "Top500," a ranking of supercomputers, as of June 2010.

"It will be the first petaflops computer in Japan," said Satoshi Matsuoka, professor at the Global Scientific Information and Computing Center (GSIC) of the university. "And it will be the first world-class supercomputer system for our university."

However, the actual construction of the system, which will be conducted by NEC Corp and Hewlett-Packard Co, has yet to be done.

The system has the "vector-scalar mixture architecture," Matsuoka said. But the computation capacity of its graphics processing units (GPUs) accounts for 90% of the total computation capacity, making the system more like a vector computer.

Therefore, the performance of the system slightly differs depending on the type of calculation. Specifically, the performance target in terms of the Linpack benchmark is 1-1.4 PFLOPS (double-precision value), which ranks third or fourth in the Top500 as of June 2010. On the other hand, for calculations that are suited for vector computers such as weather prediction, the performance can be more than 150 TFLOPS (teraflops), which is much higher than the world record (50 TFLOPS).

Outperforming "earth simulator" with one rack

The backbone of the supercomputer system consists of 2,816 units of Intel Corp's "Xeon 5600" microprocessor (developing code: Westmere-EP), which has six cores and operates at a frequency of 2.93GHz, and 4,224 units of Nvidia Corp's "Tesla M2050" GPU.

The double precision arithmetic performance of the Tesla M2050 is much higher than that of the existing Tesla GPUs, which are developed mainly for single precision arithmetic. A unit of the Tesla M2050 has a performance of 515 GFLOPS (gigaflops, double-precision value).

"The performance per node (two microprocessors and three GPUs) is 1.6 TFLOPS," Matsuoka said. "The performance per rack is 51.2 TFLOPS, which is higher than that of early earth simulators."

World's 1st SSD-based super computer?

(Continue to the next page)

World's 1st SSD-based super computer?

The university made two major improvements for enhancing the performance of the system. First, it improved the memory bandwidth. Specifically the network bisection bandwidth (the minimum communication capacity of the cross section at a random part of the system) is about 200 Tbps, which is 33 times higher than that of the Tsubame 1.0, a supercomputer system constructed by the university in 2006.

Moreover, the total memory bandwidth is 720 Tbps, which is about 42 times higher than that of the Tsubame 1.0. And the computation capacity was increased by about 30 times.

The other improvement was made to the memory and its composition. The university structured a multilevel storage using not only DRAMs such as DDR3 but also SSDs (solid state drives) composed of flash memories.

While the total memory capacity of the backbone system's DRAMs is 80.6 Tbytes for microprocessors and 12.7 Tbytes for GPUs, the total memory capacity of the SSDs is 173.9 Tbytes. SSDs have a high performance in inputting and outputting data.

"By using them to input and output local data (that are not shared by other nodes), the performance of the entire system can be enhanced," Matsuoka said.

Sharp decrease in power consumption

The new supercomputer system has one more noteworthy feature: low power consumption. While the power consumption of the Tsubame 1.0 including its cooling system is 0.85MW, that of the Tsubame 2.0, which has a 30 time higher computation capacity, is only 1MW. So, the power consumption per computation capacity was reduced to about 1/25.

The performance value per watt (in terms of the Linpack benchmark) is expected to exceed 1,000 MFLOPS (megaflops) per watt and will possibly be ranked first in the Green500, a ranking of supercomputer's energy saving performance, the university said.

"We not only used a large number of GPUs but also drastically enhanced the efficiency of the part around the power supply from 80% to 94% and employed a sealed cooling system," Matsuoka said.

(Continue to the next page)

The drastic decrease in the power consumption per computation capacity is also an advantage in terms of cost. The cost for the entire system and the basic maintenance cost for four years amount to ¥3.2 billion (approx US$35 million), which is low.

While the normal cost to introduce a supercomputer is about ¥10 million per 1 TFLOPS, the cost to introduce the Tsubame 2.0 is about ¥3 million per 1 TFLOPS.

The cost does not include electricity costs, which are about ¥100 million per year. If the electricity costs increased in the same ratio as the computation capacity, they could be up to ¥2.5 billion per year.

Matsuoka said that the university plans to introduce the Tsubame 3.0, a successor to the Tsubame 2.0, in fiscal 2014 or fiscal 2015 and has already started to develop it.

"We are aiming at around 30 PFLOPS, but the power consumption of the Tsubame 3.0 will be equivalent to or less than that of the Tsubame 2.0," he said.