[ISSCC] Fujitsu, Riken Unveil Technologies Used for 'Kei' Supercomputer
Fujitsu Ltd and Riken (Japan-based independent administrative institution) disclosed the system technologies used for the "Kei" supercomputer at ISSCC 2012, which runs from Feb 19 to 23, 2012, in the US (thesis number: 10.8).
In the Kei, a 16-Gbyte main memory is connected to the "SPARC64 VIIIfx" eight-core microprocessor. On its system board, four nodes of this are mounted. And 24 system boards and six input/output boards are mounted on a rack. The rack measures 796 x 750 x 2,060mm and uses both water- and air-cooling systems.
Power supply voltage adjusted for each chip
For the Kei, which consists of 80,000 or more nodes, variation in chip processes can be a problem. Therefore, the power supply voltage of each chip is adjusted in accordance with the leak current of the chip.
Also, the operating temperature range of 30 to 85°C was realized by minute power analysis. As a result, the power consumption per chip was reduced by 7W, which is equivalent to 1MW for the entire system and about one million US dollars in terms of annual electricity cost.
For reliability, optical interconnection not employed
According to the estimate made in June 2011, it takes 33.3 hours to carry out a 10PFLOPS benchmark test. Therefore, the failure rate of each node needs to be 36.0FIT or less. This is equivalent to 3,179 years in terms of MTBF (mean time between failures).
To improve reliability, a device isolation structure using STI (shallow trench isolation) was formed for floating point registers in addition to an ECC and a order retry function realized by hardware. Because the number of floating point registers used for the SPARC64 VIIIfx is eight times larger than that of the former product, this improvement was necessary. Because of the introduction of STI, the failure rate was improved by 20FIT.
Optical interconnection was not employed for the Kei because priority was placed on reliability. The Kei uses 200,000 cables. So, if optical interconnection is employed for them, the failure rate is expected to increase by 200FIT. Also, power consumption can be reduced by not using optical interconnection.