Activity duplicated in according to worker logs

We are using temporal in a self-hosted environment on Kubernetes. Here are the key versions.

  1. Temporal server 1.22.3
  2. Postgres Database version 14.11
  3. The application is using the Java SDK.

The application is a payment system where the workers communicate to downstream services for order management and a payment gateway.

A workflow is created for a payment authorization that is tied to a user’s specific cart. Each log shows a user’s id and a cart id.

The problem we have is this:

  1. Two simultaneous payments are made and these create Workflow A and Workflow B for user A and user B respectively.

  2. According to the console logs of Kubernetes pods, one of the activities from Workflow B is showing up in Workflow A around the same that it is executed in Workflow B.

  3. It’s the same specific activity in the situations where we’ve observed it occurring. For example, the workflow defines
    -Activity 1
    -Activity 2
    -Activity 3

We’re seeing only Activity 3 being duplicated. We only this duplication from the Kubernetes logs. It doesn’t happen very often and it’s hard to reproduce.

Note that we don’t see anything unusual in the Temporal console nor from the CLI command, i.e. using this command:

temporal workflow trace --workflow-id

  1. We do have a situation at the moment where the number of activity pollers is set too high and it is causing a RESOURCE_EXHAUSTED in the worker logs. We’re in the process of rolling out a change to fix this by reducing the number of activity pollers.

I’m after any suggestions on how this may be occurring or any other suggestions on internal Temporal logs which could help us debug the issue.

Check the ActivityTaskStarted.attempts and ActivityTaskStarted.lastFailure events for that activity.