Panic in a "can't fail" Activity

DemetriusNunes_Tino · October 13, 2023, 9:44pm

Hi there, I’m not a “native” go developer, but my team is using the Temporal.io Go SDK.

One question to a more hard-core go audience:
If I’m running the type of activity that simply can’t fail (it only fails if there’s a transient/temporary environmental issue, a very common example would be querying a close-by database), wouldn’t it make the workflow and activity code more readable and simpler to just panic in this situation and let Temporal issue a retry? (I’m assuming in most of these cases using an infinite retry policy)

I’m aware of the issues of using panic in normal Go processes, but within the Temporal confines, I have the platform providing the guarantee of durable execution, so I would see no drawback in using panic in this kind of environment.

In my view, it’s similar to letting the process crash in the Erlang world and letting the environment self-heal.

Am I totally off track here?

Thank you.

netixen · October 16, 2023, 10:28pm

Hey!

I’m not affiliated with Temporal and my experience is limited to a handful of small workflows. However, since no one has addressed your query yet, I’d like to share my perspective.

In my opinion, it’s perfectly fine to use wrappers for “retry forever” activities, especially when they run sequentially and the workflow cannot advance without them. This approach also allows me to maintain Activity options close to the respective activity, while simultaneously abstracting away the boilerplate code.

It’s crucial to be aware of the NonRetryableErrorTypes parameter in the retry policy. If Temporal will continuously retry that activity, you should either specify which errors are terminal or utilize the temporal.NewNonRetryableApplicationError() API.

Here’s an illustrative example:

func Workflow(ctx workflow.Context, accId string) error {
	var (
		account   = mustGetAccount(ctx, AccId)
		someState = mustGetSomeState(ctx, account.Data)
		moreState = mustGetMoreState(ctx, account.MoreData)
	)

	// Business logic

	return nil
}

func mustGetAccount(ctx workflow.Context, accId string) (account Account) {

	ctx = workflow.WithActivityOptions(ctx, workflow.ActivityOptions{
		StartToCloseTimeout: ShortLivedTimeout,
		RetryPolicy: &temporal.RetryPolicy{
			MaximumAttempts:        0, // retry forever
			NonRetryableErrorTypes: []string{"NonRetryableError"},
		},
	})

	err := workflow.ExecuteActivity(ctx, DB.GetAccount, accId).Get(ctx, &account)
	if err != nil {
		workflow.GetLogger(ctx).Error("DB.GetAccount failed",
			"accId", accId,
			"error", err)
		// will only panic if the error is non retryable
		panic(err)
	}

	return
}

I hope this helps! If there’s something crucial I’ve overlooked, I trust the Temporal team to provide further insights.

maxim · October 17, 2023, 5:41am

Panicking in the activity code is OK as Temporal converts the panic into a PanicError, which is reported to the service.

Panicking in the workflow code, like in the sample of @netixen, is not advised. Workflow code that panics will be blocked (by retrying workflow task) until the code is fixed not to panic.

netixen · October 17, 2023, 2:05pm

Good to know about the workflow task. Thanks @maxim.

Would recovering from specific error types like in the revised code below be fine in practice or still a bad pattern:

func recoverNonRetryable(wfErr *error) {
	r := recover()
	if r == nil || wfErr == nil {
		return
	}

	if pErr, ok := r.(error); ok {
		if IsNonRetryableError(pErr) {
			*wfErr = temporal.NewNonRetryableApplicationError(pErr.Error(), "MyNonRetryableError", nil)
			return
		}
	}

	panic(r)
}


func Workflow(ctx workflow.Context, accId string) error {
    defer recoverNonRetryable(&wfErr)
	var (
		account   = mustGetAccount(ctx, AccId)
		someState = mustGetSomeState(ctx, account.Data)
		moreState = mustGetMoreState(ctx, account.MoreData)
	)

	// Business logic

	return nil
}

func mustGetAccount(ctx workflow.Context, accId string) (account Account) {

	ctx = workflow.WithActivityOptions(ctx, workflow.ActivityOptions{
		StartToCloseTimeout: ShortLivedTimeout,
		RetryPolicy: &temporal.RetryPolicy{
			MaximumAttempts:        0, // retry forever
			NonRetryableErrorTypes: []string{"NonRetryableError"},
		},
	})

	err := workflow.ExecuteActivity(ctx, DB.GetAccount, accId).Get(ctx, &account)
	if err != nil {
		workflow.GetLogger(ctx).Error("DB.GetAccount failed",
			"accId", accId,
			"error", err)
		// will only panic if the error is non retryable
		panic(err)
	}

	return
}

maxim · October 17, 2023, 4:45pm

Nonretryable and retryable errors have a different meaning than panic of a workflow code. Both errors fail the workflow, but a retryable error causes a new workflow execution if the workflow has an associated retry policy (which is not the case by default). The panic doesn’t fail a workflow but causes it to get stuck until the issue is fixed.

Theodore_Felix_Leo · November 25, 2023, 2:50pm

is PanicError from Activity execution treated as retryable by default?

maxim · November 25, 2023, 5:40pm

Yes. All activity errors with the exception of an ApplicationError created through temporal.NewNonRetryableApplicationError are retryable by default.

Topic		Replies	Views
Questions around activity errors, retry, and more complex error handling scenarios Community Support go-sdk , retries , error-handling	9	3765	July 23, 2021
Fail Workflow After Max Retries Help Community Support java-sdk	2	529	January 20, 2023
Workflow Retry - Workflow should skip Activities which are successful in previous run Developer Corner java-sdk	8	1021	October 18, 2024
Worker does not start activity after restart Community Support go-sdk , retries , worker	17	3383	May 24, 2021
What to do when an activity cannot proceed without re-running previously completed activities? Community Support python-sdk	4	69	December 20, 2024

Panic in a "can't fail" Activity

Related topics