The Oak Ridge Leadership Computing Facility (OLCF) and technology consulting company Providentia Worldwide LLC recently collaborated to develop an intelligence system that combines real-time updates from the IBM AC922 Summit supercomputer with local weather and operational data from its adjacent cooling plant, with the goal of optimizing Summits energy efficiency. The OLCF proposed the idea and provided facility data, and Providentia developed a scalable platform to integrate and analyze the data.
On each Summit node, IBMs baseboard management controller (OpenBMC) provides real-time data readings from dozens of sensors equipped by Summits Power9 processors and NVIDIA GPUs, totaling more than 460,000 metrics per second that describe power consumption, temperature, and performance for the entire supercomputer. Although these data streams are not specifically designed for the purpose of controlling Summits cooling system, Rogers recognized early on that they could inform Summits cooling operations.
Providentia built a framework to pull from four main data sources: per-second sensor data (from Summits OpenBMC boards on each node), jobs data at 15-second intervals, the cooling plants Programmable Logic Controller, and local weather data from the National Oceanic and Atmospheric Association.