Huawei Cloud's own network monitor outperformed humans

Huawei Cloud's own network monitor outperformed humans

HomeNews, Other ContentHuawei Cloud's own network monitor outperformed humans

Sigcomm 2024 Huawei Cloud has developed a network monitoring tool that, when used in production in three of its own regions, was able to observe more of its infrastructure than existing tools, revealing problems that previously eluded human effort.

1-Zabbix Network Monitoring | Installation of zabaks to monitor the networks on the desktop

The tool is called RD-Probe and is described in a paper [PDF] presented Tuesday at the SIGCOMM 2024 conference in Sydney.

The paper explains that network monitoring is important but difficult to achieve at hyperscale. The authors — some from Huawei and others from the School of Computer Science at Peking University — cite AWS research [PDF] that says Amazon's cloud has 1,087 intra-region link path combinations and 10,176 inter-region link path combinations (and also reveals that Huawei Cloud's data center network includes over 100,000 switches and one million servers). Monitoring all that infrastructure and all those paths—in a virtualized environment that uses randomness for load balancing—makes it very difficult to collect enough data about what's happening at Layer 2.

RD-Probe is Huawei Cloud's attempt to solve that problem. The tool's developers decided to monitor each physical Layer 2 port, as this means they can observe the runtime status of switch fabrics. Considering only Layer 3, the authors write, this would mean that some ports would not be monitored.

Tagged:
Huawei Cloud's own network monitor outperformed humans.
Want to go more in-depth? Ask a question to learn more about the event.