Engineering infrastructure for new supercomputer from «AMD Technologies»

07.03.2017 11:13:45

The project of any data center is unique, the project of any supercomputer, which is essentially a certain subsort of data center is unique cubed.

And therefore, the engineering infrastructure built by the company «AMD Technologies» for the new supercomputer with the performance of 10 petaflops, which stands in the second academic building of Moscow State University received a special presentation tour. As noted by CEO of «AMD Technologies» Maxim Sohan, the project of the creation of this engineering infrastructure has required the development of fundamentally new technical solutions and overcoming of a variety of challenges, including severe restrictions on the power consumption of all systems and space they occupy, so that the experience gained by the company, of course will be of interest to all market participants of data centers construction (including Russian and foreign).

With its capacity, of those times, of 510 teraflops (FLOPS — Floating — point Operations Per Second) it took 12th place in Top-500 of the most famous high-capacity supercomputers. Now after two upgrades, its peak performance compiles 1.7 petaflops. But the development of supercomputer infrastructure of Moscow State University continues, and now work on the construction of a computer with peak performance of 10 petaflops is in the process. As the product and technologies Director of the company «T-Platforms» Igor Glukhov says, it is built on the basis of new computer systems of A-Class with the capacity of 515 teraflops, which contain 256 computational nodes (one node is a processor Intel Xeon E5-2600 v3 and accelerator NVIDIA Tesla K40). Power consumption of such a rack is 130 kWh, and the cooling is accomplished by using «hot» water (inlet temperature is + 45 ° C, outlet temperature is + 50 ° C),which is applied directly to the computing nodes.

To cool 10 petaflops

It is clear that life support system of the computer of such high level and such architecture can’t be typical. In this project, the company «AMD Technologies» had done all the work on designing of engineering infrastructure, supply and installation of electricity equipment, abstraction of heat, cooling, automatic gas extinguishing and monitoring of all the engineering systems.

If power consumption of the racks in the traditional data centers is rarely more than 15-20 kWh, so in this supercomputer the order of magnitude is very different: there are 48 computer racks with power consumption of 121 kWh each and 16 racks with power consumption of 154 kWh in the computer control room. As the technical director of the company «AMD Technologies» Viktor Gavrilov says, target specification of the design set two problems in front of the developers: heat removal organization directly from the supercomputers, which have water cooling system, and the construction of the cooling system for the rest of the equipment, which maintain supercomputer (data storage system, UPS, auxiliary server systems and others.). This required the creation of two separate cooling systems: cooling system of direct action CXC-1 for discharge the excess of heat generated by the processors of computing nodes into the atmosphere, by using water as a heat-carrying agent, supplied directly to the computer enclosure systems, and systems of cold supply of compression-type CXC-2 with traditional chillers to remove heat generated by IT systems with air-cooling, engineering infrastructure units and other equipment. Fluid cooling system of supercomputer CXC-1 should provide the heat dissipation with total capacity of about 8 MWh and 1.8 MWh of heat generated by the controlling server, data storage system, cross connect equipment and the UPS accounts for air cooling system CXC-2.

Thus, the total cold capacity of the developed system composes 10 MWh and corresponds the parameters of a rather serious data center. There are no refrigerators used in the system СХС-1, only dry cooling towers. Now, the system has eight dry cooling towers Cabero with the capacity of 1 MWh each, in addition two cooling towers will be installed later in the process of derivation of the system to the estimated capacity, the system CXC-1 has five intermediate plate and frame heat exchanger with a total capacity of 10 MWh and subheaders.

The principal diagram of the cooling system is quite simple: a dry cooling tower — circulator — plate and frame heat exchanger — the second circulation pump — cooling circuit in the racks. As the coolant in the primary circuit glycol at a temperature of + 40-46 ° C is used, and water with the temperature of +44-48 ° C in the secondary circuit in racks of supercomputers. As V. Gavrilov underlined, the system developed be «AMD Technologies» can work in Moscow in free cooling mode during the whole year without the use of dynamic cooling (even during the abnormal heat in the summer, when the temperature reached + 38.5 ° C).

Stimulus of engineering thinking

Despite the simple scheme, there were a lot of difficulties in the construction of the cooling system. First of all, it was necessary to fulfill very strict requirements of purity and chemical compound of water, circulating in the secondary cooling circuit. For this reason, the process of reversed osmoregulation is used in the system and retardants are added in the water in order to prevent the corrosion of the aluminum heat exchanger walls. These water quality requirements didn’t allow using steel pipes, so the system has poly vinyl chloride (PVC) and CPVC pipes of large diameter, capable to withstand pressure of 5-6 atm at a water temperature of + 45-50 ° C. Besides, there were serious limitations on the total power consumption of engineering infrastructure. Because of this, for example, pumps with reduced power motor were installed and this, in its turn required the decrease in speed of water in the pipeline and thus increase the diameter of tube used. A similar situation was with the choice of dry cooling towers: power consumption, heat exchange area, airflow rate, performance and price were taken into account.

Moreover, in the process of the selection of equipment special attention was paid to its performance characteristics, as the Company «AMD Technologeis» not only sets the engineering infrastructure for the production of supercomputers «T-Platform», but also deals with its maintenance.

When developing air-cooling system CXC-2 with the capacity of 1.8 MWh the main problem was the limited space. Three water tanks with the capacity of 23 cubic meters each were to be installed to support the stipulated time of off-line operation in case of power failure. However, it would be impossible to accommodate tanks of such a size with the pipelines in a dedicated for cooling system basement in parking space. That’s why designers of «AMD Technologies» decided to reduce tanks volume up to 9 cubic meters and to use them as cold accumulator. In normal working pattern such a tank contains water with a temperature of + 5 ° C. When disconnecting the chiller, the controller sends a signal to the three-way valve and begins mixing this cold water to water with a temperature of + 18 ° C, which flows in a return loop of the heat exchanger chillers to reduce its temperature to «input level»+ 12 ° C. For the organization of such a scheme two additional small units of 50 kWh were established in addition to three main chillers of cooling system CXC-2 with the capacity of 900 kWh each.

Dimensional restrictions also forced the designers to do 3D-modeling. The existing building communications and quite a number of pipelines of large diameter and power tracks of power supply system were to be stored in the premises and thus to prevent any crossings, as it was impossible to change something during installation: power tracks are available on request in strict accordance with the drawings, a rotary element of the pipeline is very expensive, delivery time is from 6 to 8 weeks, so the budget does not allow to order more than enough and there is no time for additional order in case of mistake. The only way out is to use 3D-modeling, now it is an operating tool of «AMD Technologies».

In general, according to the developer’s acknowledgement this project has broken all the stereotypes of methods of equipment selection and operations of the facility and made a great deal and taught a lesson. Problems and limitations forced engineering thought to continuously work and there were no repeat of the achievements of past projects at any stage. The unique supercomputer was a unique engineering infrastructure. By the way, infrastructure developed by «AMD Technologies» can zoom supercomputer with the increase of computation capacity up to 54 petaflops and then the first place in Top-500 will be Russian.