Error while cancelling the workflow with long running activity

Hello All,

this issue started popping up in our production servers, with the error when we try to cancel the workflow

BadRequestCancelActivityAttributes: invalid history builder state for action: add-activitytask-cancel-requested-event

we are using a long-running activity with context cancel from heartbeat to exit the workflow

like this example

but when we try to cancel the workflow, the mentioned error stated poping up

we are using

wfClient.CancelWorkflow(ctx, workflowID, "")

to cancel

we can see that the heartbeat is stopped and the activity also, but the workflow still showing as running
was working fine

History paths for an affected

  1. WorkflowExecutionStarted
  2. WorkflowTaskScheduled
  3. WorkflowTaskStarted
  4. WorkflowTaskCompleted
  5. ActivityTaskScheduled
  6. WorkflowExecutionCancelRequested
  7. WorkflowTaskScheduled
  8. ActivityTaskStarted
  9. ActivityTaskCompleted
 10. WorkflowTaskTimedOut
 11. WorkflowTaskScheduled
 12. WorkflowTaskStarted
 13. WorkflowTaskFailed ( with above error)

the issue is on

go client SDK 1.5
temporal: 1.7.1

what could be a reason? please shed some light.

Thanks in advance,
Junaid

Error is returned here:

the error means client workflow logic / SDK is trying to cancel an activity which does not exists.

can you plz provide the details of your workflow execution history and error log from both client and server?

most of time, this error will be caused by nondeterministic workflow logic change.

What version of the Go SDK are you using? The were fixes to the cancellation logic in the latest releases.

Thanks for the quick reply,

we are using sdk@v1.5.0

and there was no client error from this function call

CancelWorkflow(ctx context.Context, workflowID string, runID string) error

and error from the server looks like canceling getting retried

with logs

Debug  2021-04-05 23:45:43.000 IST.  "RequestCancelActivity"
Info   2021-04-05 23:45:43.000 IST.  "Task processing failed with error"
Debug  2021-04-05 23:45:53.000 IST   "RequestCancelActivity"
Info   2021-04-05 23:45:53.000 IST   "Task processing failed with error" 

one entry looks like this “Task processing failed with error”

TaskQueue: "matchCreate_tq"
timestamp: "2021-04-05T18:28:26.830056353Z"
caller: "log/with_logger.go:60"
WorkerID: "6@3845d77bccb7@"
message: "Task processing failed with error"
WorkerType: "WorkflowWorker"
Namespace: "default"
Error: "BadRequestCancelActivityAttributes: invalid history builder state for action: add-activitytask-cancel-requested-event"

every 10 sec is getting retried

adding the workflow history history in json format

Note: we can see that the underline activity exited but no the workflow

Regards,
Junaid

HI Junaid - if you happen to be able to post your workflow code (or a subset of it that triggers the bug) I can use that together with the workflow history to repro it and issue another fix.

Thanks for the bug report!

Sorry for the delayed response. @Spencer_Judge

the code format

the workflow client code

wfopts := client.StartWorkflowOptions{
	TaskQueue:   temporalTaskQueue,
	ID:          "CreateMatches_" + tournament.Id,
	RetryPolicy: &temporal.RetryPolicy{InitialInterval: time.Second, BackoffCoefficient: 1.2},
}

wr, err := s.wfClient.ExecuteWorkflow(ctx, wfopts, workflowCreateMatches, mProfiles, houseId, tournament.Id, tournament.T3Timestamp)
if err != nil {
	log.Error(ctx, "error starting create MatchCreation workflow:%w", err, log.KV("tourId", tournament.Id))
	return errors.WithStack(err)
}
log.Debug(ctx, "scheduled matchmaking workflow", log.KV("WorkflowID", wr.GetID()),
	log.KV("RunID", wr.GetRunID()), log.KV("tourId", tournament.Id))

var wErr error
wErr = wr.Get(ctx, &wErr)
if wErr != nil {
	return fmt.Errorf("TriggerCreateMatches  failed: %w", wErr)
}

the workflow code

func workflowCreateMatches(ctx workflow.Context, mProfiles []MatchProfile, houseId, tourId string, t3Time time.Time) error {
	// should not use custom logger
	logger := workflow.GetLogger(ctx)
	logger.Debug("starting match creation workflow", tourId)

	actx := workflow.WithActivityOptions(ctx, workflow.ActivityOptions{
		ScheduleToCloseTimeout: 1 * 24 * 7 * 52 * time.Hour,
		HeartbeatTimeout:       createAndAssignMatchesActivityTimeOut,
		//WaitForCancellation:    true,
		RetryPolicy: &temporal.RetryPolicy{InitialInterval: time.Second, BackoffCoefficient: 1},
	})

	err := workflow.ExecuteActivity(actx, new(DirectorSvc).CreateAndAssignMatches, mProfiles, houseId, tourId, t3Time).Get(ctx, nil)
	if err != nil {
		if temporal.IsCanceledError(err) {
			return nil
		}
		return fmt.Errorf("failed to exicute CreateMatches: %w", err)
	}

	return nil
}

the activity code

func (d *DirectorSvc) CreateAndAssignMatches(ctx context.Context, mProfiles []MatchProfile, houseId, tourId string, t3Time time.Time) error {
heartBeatTime := time.Time{}
for {
	if time.Now().After(t3Time) {
		log.Info(ctx, "tournament time ended match assignment stopped", log.KV("tourId", tourId), log.KV("t3Time", t3Time))
		return nil
	}

	select {
	case <-ctx.Done():
		log.Info(ctx, "context is cancelled, tournament ended match assignment stopped", log.KV("tourId", tourId))
		return nil
	case <-time.After(2 * time.Second):
		{

			//temporal internally throttle RecordHeartbeat , we don't need to make it explicitly
			activity.RecordHeartbeat(ctx, "")
			if time.Now().After(heartBeatTime.Add(1 * time.Minute)) {
				log.Debug(ctx, "HeartBeating....", log.KV("tourId", tourId))
				heartBeatTime = time.Now()
			}
         //custom logic
		}

	}
}
return nil

}

	err := s.wfClient.CancelWorkflow(ctx, workflowID, "")
if err != nil {
	log.Error(ctx, "Unable to cancel createMatches workflow", err, log.KV("tourId", tournament.Id))
	return
}

we are calling CancelWorkflow around time t3Time (parameter to activity) this is also the time when the underline activity has an exit point. ( i suspect this have something to do with the issue)

hope this will help to fix the issue if any.

Thanks and regards,
Junaid