How to implement liveness and readiness in Python Temporal application

rnmulchandani · March 29, 2024, 10:26am

I have a Python worker_client.py that connects to a temporal client using await Client.connect() and starts the worker using await worker.run().
I start the worker using the command python worker_client.py. This application is deployed in the Kubernetes environment. I want to implement the liveness and readiness for this application similar to the actuator we use in the Java application.
How can I implement this functionality?

Chad_Retz · March 29, 2024, 12:34pm

It depends on what the definition of liveness is, but there is nothing special about Temporal here vs any other Python application. So you should be able to use the same approach you would for any other Python environment. If the worker has now raised an error out of run it is “alive” (but it does internally retry on failure, and only raises exceptions on certain situations after a while).

rnmulchandani · April 1, 2024, 5:05am

Yes, that is the problem. The application still runs even when the worker cannot connect to Temporal. We don’t have any API available inside our application other than the worker-client. It would be great if you could provide an example or reference to implement readiness and liveness for this worker client will help.

Do you think we can use the following configuration of Kubernetes?

readinessProbe:
      tcpSocket:
        port: 8080

Chad_Retz · April 1, 2024, 1:29pm

You’ll want to make an endpoint available, then when implementing it, you can attempt client call in your health check via await my_client.service_client.check_health(). Temporal does not do anything Kubernetes specific, so implement a health check the same way as you do for any other Python process (and just make the client call inside the health check).

Only if you plan on opening a TCP socket when starting your process and closing it when unhealthy. There is no Temporal-specific Kubernetes health check, you may have to make your own endpoint.

rnmulchandani · April 1, 2024, 2:53pm

Thanks @Chad_Retz.
With this check_health() call, I can check whether the worker is ready to serve the request.
If the WorkflowService is up, does this mean that the worker is registered and ready to serve?

The problem that I am facing is that my microservice shows that the service is up even when the worker is not running/registered to the temporal.

I want to fail my microservice when the worker is not connected to temporal after x number of retries.

Chad_Retz · April 1, 2024, 3:00pm

Usually yes, if the worker is also running (i.e. hasn’t return from the run call or the async with). There may be cases of a worker failing but a client succeeding, but this is very rare and logs/metrics would show why. The worker is built to recover.

Is your worker running and the client can connect but the worker is not processing work? This would be rare and worth investigating deeply.

Are there any logs? Is the worker.run asyncio call running?

Whether a worker fails after a certain amount of time number of retries is an internal behavior that you cannot configure. But we do offer metrics (see samples for how to configure, and Temporal SDK metrics reference | Temporal Documentation for request failure metrics offered, you’d likely want temporal_long_request_failure).

rnmulchandani · April 1, 2024, 3:31pm

The await Client.connect() was successful but the await worker.run() failed with the following exception:

{“time”:“2024-03-14 02:29:10 PM GMT”, “stack_trace”:“None”, “log_level”:“INFO”, “logger_name”:“temporalio.worker._worker”, “message”:“Beginning worker shutdown, will wait 0:00:00 before cancelling activities”, “thread_id”:“139692215798720”, “thread_priority”:“NA”}
RuntimeError: Activity worker failed
raise RuntimeError(“Activity worker failed”) from err
File “/usr/lib64/python3.11/site-packages/temporalio/worker/_activity.py”, line 180, in run
Traceback (most recent call last):

The service was running after this, but the worker was no longer polling. I want to implement a health check for this scenario.

I want the service to self-heal. It should keep on trying to register the worker.
In the worst case, I want to mark my service as it cannot serve (it is down).

Chad_Retz · April 1, 2024, 3:36pm

If run fails, this is fatal and you should not consider this healthy and not continue running the service.

You can try to completely recreate/restart the worker after it fails if you want, but a failure coming from run is fatal for the existing worker instance. Workers only only good for one run call.

If worker.run is not running (i.e. has failed) you should consider this as down

rnmulchandani · April 1, 2024, 3:39pm

Agree but it didn’t fail my service. It should have failed it. I am running only a worker inside it.
Can I check this situation with this call await my_client.service_client.check_health() mentioned by you earlier?

Chad_Retz · April 1, 2024, 3:46pm

Did worker.run return/raise? Or did it hang trying to shutdown activities?

That would depend on why the worker is shutting down. If it’s shutting down due to server access issues, then the client calls should also fail during service access issues. Are there logs before “Beginning worker shutdown”?

Note, when this message occurs, it’s either because you have stopped the worker (e.g. cancelled the asyncio call) or because of a worker failure which should be logged above it. But we do not complete the run call until all activities have been sent cancels and then completed. If you have an activity that does not complete when cancelled (e.g. doesn’t listen to heartbeat), the worker may never return from this call.

rnmulchandani · April 2, 2024, 7:06am

I tried the following example. It always returns True even when the worker is not running. I want it to fail when the worker is not running.

client = await Client.connect(
TEMPORAL_CONSTANTS.HOST + “:” + TEMPORAL_CONSTANTS.PORT, namespace=TEMPORAL_CONSTANTS.NAMESPACE, tls=False
)
output = await client.service_client.check_health()
print("RESPONSE : ", output)

Note, when this message occurs, it’s either because you have stopped the worker (e.g. cancelled the asyncio call) or because of a worker failure which should be logged above it. But we do not complete the run call until all activities have been sent cancels and then completed. If you have an activity that does not complete when cancelled (e.g. doesn’t listen to heartbeat), the worker may never return from this call.

Yes, the worker didn’t return in our case. Hence the microservice didn’t shut down.

Chad_Retz · April 2, 2024, 12:38pm

How are you determining whether the worker is “running”? If the worker run call has not returned, it is “running”. If it is failing and retrying it logs as such. If it is at its maximum concurrent limit of work, it will not poll (metrics can show available slots, and if they are 0 it does not poll). If easy enough, can you provide a replication of the situation where the worker seems to not be “running”?

rnmulchandani · April 2, 2024, 2:52pm

What I meant is, I didn’t start the worker. The Temporal Server was up and running. I just created the sample code I shared above to check the health. It returned true even when the worker was not running.

Chad_Retz · April 2, 2024, 3:33pm

You need to combine both a client check and that worker.run is running to confirm that server is reachable and worker is running.

rnmulchandani · April 3, 2024, 6:06am

It would be great if you could provide some snippets. I think it is just returning True. It is not checking whether the worker is available to pick up the request.

rnmulchandani · April 4, 2024, 7:14am

@Chad_Retz I think I found an issue. The worker is not properly shutting down if there is an error while polling the activity task.

github.com

temporalio/sdk-python/blob/main/temporalio/bridge/worker.py#L93


      
          async def poll_workflow_activation(
              self,
          ) -> temporalio.bridge.proto.workflow_activation.WorkflowActivation:
              """Poll for a workflow activation."""
              return (
                  temporalio.bridge.proto.workflow_activation.WorkflowActivation.FromString(
                      await self._ref.poll_workflow_activation()
                  )
              )
          
          async def poll_activity_task(
              self,
          ) -> temporalio.bridge.proto.activity_task.ActivityTask:
              """Poll for an activity task."""
              return temporalio.bridge.proto.activity_task.ActivityTask.FromString(
                  await self._ref.poll_activity_task()
              )
          
          async def complete_workflow_activation(
              self,
              comp: temporalio.bridge.proto.workflow_completion.WorkflowActivationCompletion,

It gets hung in the finalize_shutdown

github.com

temporalio/sdk-python/blob/main/temporalio/bridge/worker.py#L128


      
              self._ref.record_activity_heartbeat(comp.SerializeToString())
          
          def request_workflow_eviction(self, run_id: str) -> None:
              """Request a workflow be evicted."""
              self._ref.request_workflow_eviction(run_id)
          
          def initiate_shutdown(self) -> None:
              """Start shutdown of the worker."""
              self._ref.initiate_shutdown()
          
          async def finalize_shutdown(self) -> None:
              """Finalize the worker.
          
              This will fail if shutdown hasn't completed fully due to internal
              reference count checks.
              """
              ref = self._ref
              self._ref = None
              await ref.finalize_shutdown()

This is what happened in our case. The Microservice was up but the worker got hung.

It looks like an issue with the Temporal. I found these references:

github.com/temporalio/sdk-python

[Bug] Worker hangs after polling error

opened 02:34PM - 10 Jan 24 UTC

ndtretyak

bug

### Describe the bug Worker hangs after an error during polling. ### Minim…al Reproduction I just start a worker with an incorrect token, so that my server responds with `Request unauthorized.`. ```python import asyncio from temporalio import activity, client, workflow, worker @activity.defn async def a() -> None: pass @workflow.defn class Workflow: @workflow.run async def run(self) -> None: pass async def main(): c = await client.Client.connect( "my_temporal_host:7233", rpc_metadata={"authorization": "wrong_token"}, tls=True, ) w = worker.Worker( c, task_queue="default", activities=[a], workflows=[Workflow], ) await w.run() if __name__ == "__main__": asyncio.run(main()) ``` Output: ``` 2024-01-10T14:27:15.726021Z WARN temporal_sdk_core::worker::activities::activity_task_poller_stream: Error while polling for activity tasks error=Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None } 2024-01-10T14:27:15.726020Z WARN temporal_sdk_core::worker::workflow::wft_poller: Error while polling for workflow tasks error=Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None } 2024-01-10T14:27:15.726316Z WARN temporal_sdk_core::worker::workflow::wft_poller: Error while polling for workflow tasks error=Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None } 2024-01-10T14:27:15.726543Z WARN temporal_sdk_core::worker::workflow::wft_poller: Error while polling for workflow tasks error=Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None } 2024-01-10T14:27:15.726584Z WARN temporal_sdk_core::worker::workflow::wft_poller: Error while polling for workflow tasks error=Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None } 2024-01-10T14:27:15.727412Z WARN temporal_sdk_core::worker::workflow::wft_poller: Error while polling for workflow tasks error=Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None } 2024-01-10T14:27:15.727472Z WARN temporal_sdk_core::worker::workflow::workflow_stream: WFT poller errored, shutting down 2024-01-10T14:27:15.727490Z ERROR temporal_sdk_core::worker::workflow::workflow_stream: Workflow processing encountered fatal error and must shut down TonicError(Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None }) 2024-01-10T14:27:15.728052Z WARN temporal_sdk_core::worker::workflow::workflow_stream: WFT poller errored, shutting down 2024-01-10T14:27:15.728070Z ERROR temporal_sdk_core::worker::workflow::workflow_stream: Workflow processing encountered fatal error and must shut down TonicError(Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None }) 2024-01-10T14:27:15.728168Z WARN temporal_sdk_core::worker::workflow::workflow_stream: WFT poller errored, shutting down 2024-01-10T14:27:15.728475Z ERROR temporal_sdk_core::worker::workflow::workflow_stream: Workflow processing encountered fatal error and must shut down TonicError(Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None }) 2024-01-10T14:27:15.728506Z WARN temporal_sdk_core::worker::workflow::workflow_stream: WFT poller errored, shutting down 2024-01-10T14:27:15.728512Z ERROR temporal_sdk_core::worker::workflow::workflow_stream: Workflow processing encountered fatal error and must shut down TonicError(Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None }) 2024-01-10T14:27:15.728945Z WARN temporal_sdk_core::worker::workflow::workflow_stream: WFT poller errored, shutting down 2024-01-10T14:27:15.728958Z ERROR temporal_sdk_core::worker::workflow::workflow_stream: Workflow processing encountered fatal error and must shut down TonicError(Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None }) Worker failed, shutting down Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/temporalio/worker/_activity.py", line 160, in run task = await poll_task ^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/temporalio/bridge/worker.py", line 98, in poll_activity_task await self._ref.poll_activity_task() RuntimeError: Poll failure: Unhandled grpc error when activity polling: Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None } The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/temporalio/worker/_activity.py", line 180, in run raise RuntimeError("Activity worker failed") from err RuntimeError: Activity worker failed 2024-01-10T14:27:15.729754Z WARN temporal_sdk_core::worker::activities::activity_task_poller_stream: Error while polling for activity tasks error=Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None } ``` Then, the process just hangs forever. ### Environment/Versions - SDK Version: 1.5.0

github.com/temporalio/sdk-core

[Bug] Worker.finalize_shutdown seems to hang when poll never succeeded due to server permission failure

opened 08:46PM - 10 Jan 24 UTC

cretz

bug

### Describe the bug Originally reported at https://github.com/temporalio/sdk…-python/issues/459. To replicate, first run a local Temporal server, then alter https://github.com/temporalio/samples-go/blob/main/grpc-proxy/proxy-server/main.go to change the gRPC dialing to: ```go grpcClient, err := grpc.Dial( upstreamFlag, grpc.WithTransportCredentials(insecure.NewCredentials()), // grpc.WithUnaryInterceptor(clientInterceptor), grpc.WithUnaryInterceptor(func( ctx context.Context, method string, req, reply any, cc *grpc.ClientConn, invoker grpc.UnaryInvoker, opts ...grpc.CallOption) error { if strings.HasPrefix(method, "/temporal.api.workflowservice.v1.WorkflowService/Poll") { return status.Error(codes.PermissionDenied, "DENIED!") } return invoker(ctx, method, req, reply, cc, opts...) }), ) ``` This will simulate permission denied upstream. Then run it, e.g. `go run ./grpc-proxy/proxy-server` and it'll run a proxy at `127.0.0.1:8081` that proxies to `127.0.0.1:7233` by default. Then run the Python script given in that issue pointing to this proxy: ```python import asyncio from temporalio import activity, client, workflow, worker @activity.defn async def a() -> None: pass @workflow.defn class Workflow: @workflow.run async def run(self) -> None: pass async def main(): c = await client.Client.connect( "127.0.0.1:8081", rpc_metadata={"authorization": "wrong_token"}, # tls=True, ) w = worker.Worker( c, task_queue="default", activities=[a], workflows=[Workflow], ) await w.run() if __name__ == "__main__": asyncio.run(main()) ``` After about a minute or so, the worker will fatally error due to too many permission denials. But the worker process gets to await finalize_shutdown and hangs. Feel free to add logs inside `.venv/Lib/site-packages/temporalio/worker/_worker.py`'s `run` call to see how far it gets.

Chad_Retz · April 4, 2024, 12:19pm

worker_running = False

async def run_worker(worker: Worker) -> None:
    global worker_running
    worker_running = True
    try:
        await worker.run()
    finally:
        worker_running = False

async def check_health(client: Client) -> None:
    if not worker_running:
        raise RuntimeError("worker not running")
    await client.service_client.check_health()

Something like that.

What was the polling error? Is that error not also occurring from a client health check?

rnmulchandani · April 4, 2024, 12:35pm

We are getting the same error mentioned in this issue:

github.com/temporalio/sdk-python

[Bug] Worker hangs after polling error

opened 02:34PM - 10 Jan 24 UTC

ndtretyak

bug

### Describe the bug Worker hangs after an error during polling. ### Minim…al Reproduction I just start a worker with an incorrect token, so that my server responds with `Request unauthorized.`. ```python import asyncio from temporalio import activity, client, workflow, worker @activity.defn async def a() -> None: pass @workflow.defn class Workflow: @workflow.run async def run(self) -> None: pass async def main(): c = await client.Client.connect( "my_temporal_host:7233", rpc_metadata={"authorization": "wrong_token"}, tls=True, ) w = worker.Worker( c, task_queue="default", activities=[a], workflows=[Workflow], ) await w.run() if __name__ == "__main__": asyncio.run(main()) ``` Output: ``` 2024-01-10T14:27:15.726021Z WARN temporal_sdk_core::worker::activities::activity_task_poller_stream: Error while polling for activity tasks error=Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None } 2024-01-10T14:27:15.726020Z WARN temporal_sdk_core::worker::workflow::wft_poller: Error while polling for workflow tasks error=Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None } 2024-01-10T14:27:15.726316Z WARN temporal_sdk_core::worker::workflow::wft_poller: Error while polling for workflow tasks error=Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None } 2024-01-10T14:27:15.726543Z WARN temporal_sdk_core::worker::workflow::wft_poller: Error while polling for workflow tasks error=Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None } 2024-01-10T14:27:15.726584Z WARN temporal_sdk_core::worker::workflow::wft_poller: Error while polling for workflow tasks error=Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None } 2024-01-10T14:27:15.727412Z WARN temporal_sdk_core::worker::workflow::wft_poller: Error while polling for workflow tasks error=Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None } 2024-01-10T14:27:15.727472Z WARN temporal_sdk_core::worker::workflow::workflow_stream: WFT poller errored, shutting down 2024-01-10T14:27:15.727490Z ERROR temporal_sdk_core::worker::workflow::workflow_stream: Workflow processing encountered fatal error and must shut down TonicError(Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None }) 2024-01-10T14:27:15.728052Z WARN temporal_sdk_core::worker::workflow::workflow_stream: WFT poller errored, shutting down 2024-01-10T14:27:15.728070Z ERROR temporal_sdk_core::worker::workflow::workflow_stream: Workflow processing encountered fatal error and must shut down TonicError(Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None }) 2024-01-10T14:27:15.728168Z WARN temporal_sdk_core::worker::workflow::workflow_stream: WFT poller errored, shutting down 2024-01-10T14:27:15.728475Z ERROR temporal_sdk_core::worker::workflow::workflow_stream: Workflow processing encountered fatal error and must shut down TonicError(Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None }) 2024-01-10T14:27:15.728506Z WARN temporal_sdk_core::worker::workflow::workflow_stream: WFT poller errored, shutting down 2024-01-10T14:27:15.728512Z ERROR temporal_sdk_core::worker::workflow::workflow_stream: Workflow processing encountered fatal error and must shut down TonicError(Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None }) 2024-01-10T14:27:15.728945Z WARN temporal_sdk_core::worker::workflow::workflow_stream: WFT poller errored, shutting down 2024-01-10T14:27:15.728958Z ERROR temporal_sdk_core::worker::workflow::workflow_stream: Workflow processing encountered fatal error and must shut down TonicError(Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None }) Worker failed, shutting down Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/temporalio/worker/_activity.py", line 160, in run task = await poll_task ^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/site-packages/temporalio/bridge/worker.py", line 98, in poll_activity_task await self._ref.poll_activity_task() RuntimeError: Poll failure: Unhandled grpc error when activity polling: Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None } The above exception was the direct cause of the following exception: Traceback (most recent call last): File "/usr/local/lib/python3.11/site-packages/temporalio/worker/_activity.py", line 180, in run raise RuntimeError("Activity worker failed") from err RuntimeError: Activity worker failed 2024-01-10T14:27:15.729754Z WARN temporal_sdk_core::worker::activities::activity_task_poller_stream: Error while polling for activity tasks error=Status { code: PermissionDenied, message: "Request unauthorized.", details: b"\x08\x07\x12\x15Request unauthorized.\x1aJ\nHtype.googleapis.com/temporal.api.errordetails.v1.PermissionDeniedFailure", metadata: MetadataMap { headers: {"content-type": "application/grpc"} }, source: None } ``` Then, the process just hangs forever. ### Environment/Versions - SDK Version: 1.5.0

It says PermissionDenied error while polling. The activity worker gets failed but the worker gets hung.
e[2mtemporal_sdk_core::worker::activities::activity_task_poller_streame[0me[2m:e[0m Error while polling for activity tasks e[3merrore[0me[2m=e[0mStatus { code: PermissionDenied, message: “Request unauthorized.” }

Chad_Retz · April 4, 2024, 12:48pm

Do you also get this error while checking health using the same client you started the worker with? If so, that should properly fail your health check correct? I will see about prioritizing fixing the worker shutdown issue.

rnmulchandani · April 12, 2024, 2:52pm

Do you have any timelines for fixing this worker shutdown issue?

Topic		Replies	Views
Health Check in the Python SDK Community Support python-sdk	1	532	October 12, 2023
Connection failure Community Support go-sdk	1	1210	October 15, 2021
Temporal worker pod liveness probe failure Community Support kubernetes	1	1575	July 12, 2021
Proper termination of a worker in Python in Kubernetes Community Support python-sdk , general-impl , kubernetes	12	844	June 12, 2024
Temporal Client/Worker health-check Community Support java-sdk	8	3556	January 21, 2021

How to implement liveness and readiness in Python Temporal application

Related topics