home ->

Observability That Helps On-Call

By Hoang-Long Nguyen · April 17, 2026 · Observability, OpenTelemetry, Alerts, On-call

Dashboards and alerts should reduce decision time, not decorate a wall of screens.


Good observability is not the maximum number of charts. It is the minimum path from symptom to decision.

Make the first screen operational

The first dashboard should answer three questions: is the user path failing, where is the failure concentrated, and did something recently change? Everything else can be one click away.

Tie alerts to ownership

An alert without an owner becomes ambient stress. Route alerts to the team that can act, include the runbook, and make the threshold explain why the page is worth interrupting someone.

Prefer traces for unknown paths

Metrics tell you that something moved. Traces often explain where it moved. For distributed systems, that difference is the gap between guessing and debugging.