We are configuring logs alerts so as to be notified of any failure.
- When application to database connectivity is broken
- When communication between temporal services is not working (frontend, matching, history, worker)
Are there any standard logs for different failures that can be monitored to identify failure?
Recommended way is to utilize server metrics.
Temporal server emits the
"persistence_errors" metric that is a good indicator for connection issues between server and persistence store.
"client_errors" metric is a good indicator for connection issues between the different server roles.
You can take advantage of these metrics and build your alerts in Grafana for example to help.