Case Study · Edge Data Center Operations

Faster Incident Triage Across A Mixed-Generation Edge Facility

How the operator of a single-digit-megawatt edge compute site reduced incident triage time by 25 to 35 percent using Cerbrec's Intelligence Operations Platform.

Network technician inspecting equipment in an edge data center

25–35%

Faster incident triage across all monitored systems

800+

Raw alerts per shift reduced to a short, ranked, actionable list

100%

Diagnostics and actions logged, traceable, and audit-ready

The Challenge

The Alarms That Actually Mattered Were Buried in the Ones That Didn't.

Running a mixed-generation facility means managing infrastructure that was never designed to speak the same language. At this single-digit-megawatt edge site, older systems sit alongside newer ones, each generating its own continuous stream of signals over its own protocol. The result was an alarm environment that had become genuinely unmanageable for a lean on-site team.

Every shift brought over 800 alerts across thermal, power, and network systems. The vast majority were noise. Over time, engineers learned to stop relying on the alerts altogether. They developed their own passed-down strategies for identifying problems, and worked around the tools rather than through them.

The operator's main attempt at improving automated diagnosis was a legacy alarm classification system. On paper, it was supposed to help. In practice, it was too generalizing to be trusted. It could not account for the life and history of individual equipment, the context of nearby assets, or the specific part swaps that had occurred since original installation. It had a comments field where engineers could add context, but the system had no way to learn from those inputs. When a diagnostic tool lacks that kind of contextual awareness, engineers stop trusting it.

What was needed was a platform that could reason across systems, connect a signal in one domain to its cause in another, and act on it with the full operational picture in view.

Recognise this problem? See how Cerbrec approaches data center operations

The Solution

One Intelligence Layer Across Every System in the Facility

Cerbrec's platform deploys a fleet of AI agents that brings an intelligence orchestration layer to the data center. What sets it apart is the ability to diagnose across different systems cohesively and automatically, rather than one system at a time. The models were developed in research collaboration with SRI International and learn automatically from historical telemetry data.

Because each model can adapt dynamically to the individual sensor level, the approach scales across a facility while enabling hyper-personalized and contextual alarm classifications, without rebuilding itself for each new signal.

Deployment did not require ripping out or replacing anything. Cerbrec's on-site application speaks directly with the facility's heterogeneous protocol collectors, across BMS, power, cooling, and network systems of every generation, normalizing and cleaning each stream before ingestion. This meant that the platform was reading the whole facility coherently within weeks rather than requiring a multi-quarter integration project.

Cerbrec deployed its hierarchy of agents across the facility's operational systems, working directly against the data center's KPIs. A Director agent owns the topline KPIs, holding every signal against SLA commitments and performance thresholds. Domain manager agents handle power, cooling, and operations independently but within the same connected intelligence loop. Shared sub-agents run underneath, covering root cause analysis, anomaly detection, and predictive flagging on a continuous basis.

The agents have specialized domain and functionality, so the platform does more than detect and flag. It intelligently interprets alarms, investigates across systems, runs simulations, and takes action through both PLC modifications and maintenance work orders, so resolution does not wait for a human to translate a diagnosis into a task.

Director Level

KPI Segment Agent

Orchestration & Oversight

Operations

System-wide workflow orchestration and process lifecycle management.

Power

Energy consumption balancing and redundant UPS management.

Cooling

Thermal distribution and liquid cooling loop optimization.

Maintenance

Predictive hardware analysis and scheduled downtime sequencing.

Model Gov

Policy enforcement and ethical constraint monitoring for AI models.

Shared Sub-Agents

Root-cause analysisAnomaly detectionPredictive maintenance

What This Looks Like In Practice

The clearest way to understand what the platform does is to follow one event through it.

A rising PUE, traced to its source

The Director agent noticed the site's PUE drifting upward, from its 1.45 baseline toward 1.48 over the course of an hour, with no corresponding change in the weather or the facility's overall load profile.

The platform worked back through the telemetry and traced the drift to the cooling system's response around a single rack: a batch workload had ramped that rack from 8 kW to 21 kW in a few minutes, and the room-level cooling controls had reacted to the resulting hot spot by driving up output across the room.

Because fan power rises steeply with fan speed, that broad response was costing tens of kilowatts of cooling overhead to manage what was, in reality, thirteen kilowatts of contained heat in one rack. The cooling overreaction, and not the workload, was the source of the drift.

Contained heat, answered precisely

The cooling agent confirmed that the heat was contained to that one rack rather than a room-wide problem. The instinctive response would have been to push the nearest in-row cooler to work harder, but the agent ran the analysis first and determined that distributing the load across several nearby in-row coolers at moderate fan speeds would deliver the same or better cooling outcome at a fraction of the fan power.

The rebalancing action fell within the operator's pre-approved governance envelope for cooling setpoints, executed automatically, and PUE settled back to baseline.

Reasoning across domains

This is what reasoning across domains makes possible. The platform held power, cooling, and operations in a single picture, so a rising number in one domain could be answered with a precise, considered action in another.

The Outcome

What Changed When Cerbrec Ran Across the Operation

From 800 Alerts to a Ranked List That Means Something

Before Cerbrec, a large portion of each shift went into investigation: pulling logs, opening dashboards, correlating signals that should have been connected but weren't. After deployment, that burden shifted to the platform. Incident triage time dropped by 25 to 35 percent across monitored systems. The 800-plus raw alerts that had defined a shift became a short, ranked list of incidents that warranted attention, each arriving with its probable cause already attached.

A Compliance Trail That Builds Itself

Every diagnostic action, every recommendation, every governance decision, and every operator response was logged automatically from day one. What had previously required manual reconstruction before every audit now existed as a continuous, automatically generated record. The audit trail was no longer something that had to be built. It was always already there.

What This Deployment Demonstrated

The gap between a reactive data center and an intelligent one is not a hardware problem or a staffing problem. It is an intelligence problem. The signals had always been there, scattered across systems that were never built to share them. What Cerbrec changed was adding one intelligence orchestration layer that could read them together.

Before Cerbrec

With Cerbrec

Engineers ignored automated alerts and relied on passed-down workaround strategies

Alerts ranked automatically by priority, each with probable cause already attached

Legacy classification system too generalizing to be trusted

Hyper-personalized, contextual alarm classifications that account for equipment history and configuration

Cooling responded broadly to localized heat, wasting energy across the room

Cooling load balanced precisely across nearby assets, same outcome with less overhead

Compliance records rebuilt manually for every audit

Full audit trail logged automatically from day one

Get Started

See What Cerbrec Delivers in Your Environment

This deployment is one example of how the Cerbrec Intelligence Operations Platform performs in a data center environment. See the full data center solutions page or speak to the Cerbrec team about your specific operation.

Request a Demo