Panic in a "can't fail" Activity

Hi there, I’m not a “native” go developer, but my team is using the Temporal.io Go SDK.

One question to a more hard-core go audience:
If I’m running the type of activity that simply can’t fail (it only fails if there’s a transient/temporary environmental issue, a very common example would be querying a close-by database), wouldn’t it make the workflow and activity code more readable and simpler to just panic in this situation and let Temporal issue a retry? (I’m assuming in most of these cases using an infinite retry policy)

I’m aware of the issues of using panic in normal Go processes, but within the Temporal confines, I have the platform providing the guarantee of durable execution, so I would see no drawback in using panic in this kind of environment.

In my view, it’s similar to letting the process crash in the Erlang world and letting the environment self-heal.

Am I totally off track here?

Thank you.

Hey!

I’m not affiliated with Temporal and my experience is limited to a handful of small workflows. However, since no one has addressed your query yet, I’d like to share my perspective.

In my opinion, it’s perfectly fine to use wrappers for “retry forever” activities, especially when they run sequentially and the workflow cannot advance without them. This approach also allows me to maintain Activity options close to the respective activity, while simultaneously abstracting away the boilerplate code.

It’s crucial to be aware of the NonRetryableErrorTypes parameter in the retry policy. If Temporal will continuously retry that activity, you should either specify which errors are terminal or utilize the temporal.NewNonRetryableApplicationError() API.

Here’s an illustrative example:

func Workflow(ctx workflow.Context, accId string) error {
	var (
		account   = mustGetAccount(ctx, AccId)
		someState = mustGetSomeState(ctx, account.Data)
		moreState = mustGetMoreState(ctx, account.MoreData)
	)

	// Business logic

	return nil
}

func mustGetAccount(ctx workflow.Context, accId string) (account Account) {

	ctx = workflow.WithActivityOptions(ctx, workflow.ActivityOptions{
		StartToCloseTimeout: ShortLivedTimeout,
		RetryPolicy: &temporal.RetryPolicy{
			MaximumAttempts:        0, // retry forever
			NonRetryableErrorTypes: []string{"NonRetryableError"},
		},
	})

	err := workflow.ExecuteActivity(ctx, DB.GetAccount, accId).Get(ctx, &account)
	if err != nil {
		workflow.GetLogger(ctx).Error("DB.GetAccount failed",
			"accId", accId,
			"error", err)
		// will only panic if the error is non retryable
		panic(err)
	}

	return
}

I hope this helps! If there’s something crucial I’ve overlooked, I trust the Temporal team to provide further insights.

Panicking in the activity code is OK as Temporal converts the panic into a PanicError, which is reported to the service.

Panicking in the workflow code, like in the sample of @netixen, is not advised. Workflow code that panics will be blocked (by retrying workflow task) until the code is fixed not to panic.

1 Like

Good to know about the workflow task. Thanks @maxim.

Would recovering from specific error types like in the revised code below be fine in practice or still a bad pattern:

func recoverNonRetryable(wfErr *error) {
	r := recover()
	if r == nil || wfErr == nil {
		return
	}

	if pErr, ok := r.(error); ok {
		if IsNonRetryableError(pErr) {
			*wfErr = temporal.NewNonRetryableApplicationError(pErr.Error(), "MyNonRetryableError", nil)
			return
		}
	}

	panic(r)
}


func Workflow(ctx workflow.Context, accId string) error {
    defer recoverNonRetryable(&wfErr)
	var (
		account   = mustGetAccount(ctx, AccId)
		someState = mustGetSomeState(ctx, account.Data)
		moreState = mustGetMoreState(ctx, account.MoreData)
	)

	// Business logic

	return nil
}

func mustGetAccount(ctx workflow.Context, accId string) (account Account) {

	ctx = workflow.WithActivityOptions(ctx, workflow.ActivityOptions{
		StartToCloseTimeout: ShortLivedTimeout,
		RetryPolicy: &temporal.RetryPolicy{
			MaximumAttempts:        0, // retry forever
			NonRetryableErrorTypes: []string{"NonRetryableError"},
		},
	})

	err := workflow.ExecuteActivity(ctx, DB.GetAccount, accId).Get(ctx, &account)
	if err != nil {
		workflow.GetLogger(ctx).Error("DB.GetAccount failed",
			"accId", accId,
			"error", err)
		// will only panic if the error is non retryable
		panic(err)
	}

	return
}

Nonretryable and retryable errors have a different meaning than panic of a workflow code. Both errors fail the workflow, but a retryable error causes a new workflow execution if the workflow has an associated retry policy (which is not the case by default). The panic doesn’t fail a workflow but causes it to get stuck until the issue is fixed.

is PanicError from Activity execution treated as retryable by default?

Yes. All activity errors with the exception of an ApplicationError created through temporal.NewNonRetryableApplicationError are retryable by default.