Immersion-1: The World’s Largest FPGA Cluster

Built into the world’s first dedicated 2-phase immersion cooling facility, Immersion-1 runs on 3M™ Novec™ Engineered Fluids. With the performance of a supercomputer, it is one of the most energy efficient ones worldwide – even more so when considering the hot and humid climate of Hong Kong.

Originally, Immersion-1 started as a project on a much smaller scale using traditional air-conditioning for cooling. It soon became apparent that traditional air cooling would impose many challenges on space and electricity usage. With 6048 FPGA chips, combining 890 million logic cells, Immersion-1 has become a massive prototype and proof of concept for a whole new generation of computing.

Available downloads:

Since the special application tweaks the maximum performance out of each FPGA, they generate much more heat than in traditional FPGA applications. Often, FPGAs have to be throttled down or are not running at maximum performance due to cooling issues. In case of Immersion-1, the cluster would not be able to run on passive cooling and the FPGA chip temperature rises within seconds above its maximum specifications.

Only by using immersion cooling it was possible to build and run Immersion-1 with its very demanding cooling requirements. More than 90% of the electricity could be saved in comparison to traditional air cooling.

If ordinary servers would have been used to achieve the same computing performance, then more than 8,500 Dual Xeon 1U servers would have to be placed in more than 200 racks with 42U height. The power rating would be about 6.4 Megawatt generating around 45,000 tons of CO2 every year. That is equivalent to saving the same amount of CO2 as produced by 1.65% of all cars in Hong Kong.

 

Immersion-1 Technical Details

In short, Immersion-1 is a massive immersion cooled FPGA cluster with 6048 Spartan®-6 LX150 FPGA chips from Xilinx. The FPGA chips sit on 1512 hot swappable boards and get power and communications from backplanes that are connected to the outside world.

While Immersion-1 is a fully functional unit that does indeed crunch numbers 24/7, it is also proof of concept that 2-phase immersion cooling is not just a dream of the future, it is a viable and elegant solution right here and right now.

Most importantly, our 2-phase immersion cooling approach cuts down engineering inefficiencies of legacy designs, lowers development and operational costs and results in a drastically lower carbon foot print. And all this while enabling us to pack the hardware in a never before seen density, keep the supporting infrastructure at a minimum and use our existing premises, instead of investing our life savings into large and remote co-location data center facility. In fact, with all that space saved, we could place our tanks at an ergonomic height for easy access – with plenty of headroom left.

Immersion-1 Fact Sheet:

  • Immersion-1 is the name of the world’s largest FPGA cluster and the facility where we host it.
  • Immersion-1 is a massive prototype and proof of concept for a whole new generation of computation clusters.
  • Immersion-1 had to be approved by two governments before we could build it.
  • Immersion-1 was built in less than half a year with a budget lower than traditional CRAC cooling equipment.
  • Immersion-1 is energy efficient and saves more than 90% on electricity in comparison to traditional air cooling (even with all tricks in the book of air cooling employed, ie. cold/hot aisle containment etc).

The Core

  • 6048 Xilinx Spartan®-6 LX150 FPGA cores with a total of 891 million logic cells
  • 1512 hot swappable boards, connected to high density backplanes (board to board 8.5mm)
  • Resulting in 64 FPGA chips organized in “logic cubes” of only 160mm (6.5″) on the longest side

Immersion Cooling

  • In-house developed passive 2-phase immersion cooling system
  • Powered by 3M™ Novec™ Engineered Fluids
  • Total heat dissipation typically 70kW
  • Running on only 3 variable speed outdoor fans (often our fans run on minimum)
  • Reduced FPGA junction temperature, lower error rates and higher frequencies

Power Supply

  • 96 ATX high reliability 850 Watt power supplies, capable of delivering a maximum of 81.6 Kilowatt to the cluster
  • PSU to board power distribution 12V
  • Measured efficiency >90%
  • Industrial strength remote controlled power switches
  • Real-time current measurement down to the PSU level
  • 312A 230V typical load, 528A 230V maximum load
  • 800A 400V TP/N Plug-In MCCB at facility level

Monitoring and Data Acquisition

  • Networked modular DAQ system
  • Measuring temperatures, pressure, liquid level and flow

Facility Monitoring and Control

  • Temperature & Humidity
  • Lights, Air Ventilation, Doors
  • Flow rates, Fan & Pump frequencies

Silent and Dust Free Operation

Most of the time the only sound you hear is the hum of our high current main power cables. Once in a while, we work around the cluster and that’s when it gets really “noisy”: we have to turn on the air condition (Hong Kong has a hot and humid climate). Compare that with a modern data center facility where noise and dust is usually a real problem (typically 70-80 dB, using ear protectors in air cooled data centers is recommended).