Our worker services are struggling to start with errors like:
{"level":"fatal","ts":"2023-01-09T23:45:59.585Z","msg":"error creating sdk client","service":"worker","error":"failed reaching server: last connection error: connection error: desc = \"transport: Error while dialing dial tcp 10.0.11.207:7233: connect: connection refused\"","logging-call-
That looks like an old IP address of a frontend service. The actual running frontend is on a different IP address.
Running the following tcl (I think) indicates that 356 services are registered (only 3 are running, heh). I think these are old registrations from where I was trial-and-erroring getting the services stood up:
tctl admin membership list_db | grep role | wc -l
356
Is there some way to “purge” the list of registered servers, so that the worker can connect to one that’s actually alive? Or do I need to wait for them to timeout?
Many thanks