Ruchir
July 7, 2022, 11:57am
1
Hi,
I upgraded my temporal server from 1.14.0 to 1.16.2 on GKE. Post that, when i try to connect to the metric endpoint for frontend service, I get ‘unable to connect to host’ error and prometheus is unable to scrape any metrics from the endpoint. Downgrading it back to 1.14.0 or 1.15.0 resolves that issue.
We had also upgraded temporal sdk, but that doesn’t give any issue and we are able to see the sdk metrics being exposed as usual.
Is this a bug or are we missing something in upgrade?
That is not expected. But you cannot upgrade directly from 1.14 to 1.16. It is required that you go though from 1.14 to 1.15 then to 1.16.
We are also seeing the same issue. We upgraded from 1.15 to 1.16.3, and all the server metrics stopped showing up. Upon closer inspection, we noticed that the prometheus endpoint is no longer open on any of the temporal servers. Any idea on how we can resolve this?
In 1.16.3:
bash-5.1# netstat -lntup
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:4289 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:3212 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:15021 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:15021 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:15021 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:15021 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:3855 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:15090 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:15090 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:15090 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:15090 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:15000 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:15001 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:15001 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:15001 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:15001 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:3771 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:4955 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:15004 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:4956 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:6973 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:15006 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:15006 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:15006 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:15006 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:3198 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:3199 0.0.0.0:* LISTEN -
tcp 0 0 ::1:4289 :::* LISTEN -
tcp 0 0 :::45091 :::* LISTEN -
tcp 0 0 :::15020 :::* LISTEN -
tcp 0 0 :::9101 :::* LISTEN -
tcp 0 0 :::2702 :::* LISTEN 12/temporal-server
tcp 0 0 :::2703 :::* LISTEN 12/temporal-server
tcp 0 0 ::1:3855 :::* LISTEN -
tcp 0 0 ::1:3771 :::* LISTEN -
udp 0 0 0.0.0.0:52747 0.0.0.0:* -
udp 0 0 0.0.0.0:56021 0.0.0.0:* -
udp 0 0 0.0.0.0:56585 0.0.0.0:* -
udp 0 0 0.0.0.0:57014 0.0.0.0:* -
udp 0 0 0.0.0.0:57498 0.0.0.0:* -
udp 0 0 0.0.0.0:59258 0.0.0.0:* -
udp 0 0 0.0.0.0:514 0.0.0.0:* -
udp 0 0 127.0.0.1:8125 0.0.0.0:* -
udp 0 0 0.0.0.0:36415 0.0.0.0:* -
udp 0 0 0.0.0.0:36542 0.0.0.0:* -
udp 0 0 0.0.0.0:36819 0.0.0.0:* -
udp 0 0 0.0.0.0:37409 0.0.0.0:* -
udp 0 0 0.0.0.0:38755 0.0.0.0:* -
udp 0 0 0.0.0.0:39911 0.0.0.0:* -
udp 0 0 0.0.0.0:40844 0.0.0.0:* -
udp 0 0 0.0.0.0:41231 0.0.0.0:* -
udp 0 0 0.0.0.0:45547 0.0.0.0:* -
udp 0 0 0.0.0.0:46073 0.0.0.0:* -
udp 0 0 0.0.0.0:46656 0.0.0.0:* -
udp 0 0 0.0.0.0:47792 0.0.0.0:* -
udp 0 0 0.0.0.0:47956 0.0.0.0:* -
udp 0 0 :::514 :::* -
And in 1.15:
bash-5.1# netstat -lntup
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 127.0.0.1:15000 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:15001 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:15001 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:15001 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:15001 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:3771 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:4955 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:15004 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:4956 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:15933 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:6973 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:15006 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:15006 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:15006 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:15006 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:3198 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:4287 0.0.0.0:* LISTEN -
tcp 0 0 127.0.0.1:3199 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:3212 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:15021 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:15021 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:15021 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:15021 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:15090 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:15090 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:15090 0.0.0.0:* LISTEN -
tcp 0 0 0.0.0.0:15090 0.0.0.0:* LISTEN -
tcp 0 0 ::1:3771 :::* LISTEN -
tcp 0 0 ::1:15933 :::* LISTEN -
tcp 0 0 ::1:4287 :::* LISTEN -
tcp 0 0 :::9090 :::* LISTEN 13/temporal-server
tcp 0 0 :::45091 :::* LISTEN -
tcp 0 0 :::2602 :::* LISTEN 13/temporal-server
tcp 0 0 :::2603 :::* LISTEN 13/temporal-server
tcp 0 0 :::15020 :::* LISTEN -
tcp 0 0 :::9101 :::* LISTEN -
udp 0 0 0.0.0.0:35608 0.0.0.0:* -
udp 0 0 0.0.0.0:38415 0.0.0.0:* -
udp 0 0 0.0.0.0:41993 0.0.0.0:* -
udp 0 0 0.0.0.0:42347 0.0.0.0:* -
udp 0 0 0.0.0.0:52241 0.0.0.0:* -
udp 0 0 0.0.0.0:58273 0.0.0.0:* -
udp 0 0 0.0.0.0:59665 0.0.0.0:* -
udp 0 0 0.0.0.0:514 0.0.0.0:* -
udp 0 0 127.0.0.1:8125 0.0.0.0:* -
udp 0 0 :::514 :::* -
As you can see, in 1.15, there are three ports which the temporal-server listens to . In 1.16, there are only 2 ports, and crucially, the port 9090 is not open
The per service metrics endpoint is deprecated since v1.11.0 Release v1.11.0 · temporalio/temporal · GitHub
And it is removed from code in 1.16.0.
Please use the global config for metrics, see more here: Temporal Cluster configuration reference | Temporal Documentation
2 Likes