Best practices for providing UI updates on workflow status?

tl;dr - is there a temporal pattern for notifying frontends of workflow state transitions, or does one need to use some other pubsub style service to push the state changes to clients?

I’m working through the nest.js sample app and noticed a race condition where the UI can report a purchase as cancelled even if the backed completes the purchase because the UI isn’t actually reporting on the true state of the workflow.

The temporal backend workflow in the example gives customers a 5 second window to cancel an order before completing the transaction. The UI uses its own 5 second timer to tell the user when the order is cancellable rather than listening to the temporal backend as the authoritative source of truth on whether the transaction is completed.

I’m trying to figure out how to provide the UI with push-like (long poll?) notifications of workflow state transitions so the temporal backend can be the single source of truth and the frontend doesn’t need to hacky reimplement parts of the workflow to make guesses about the workflow state.

Is there a way to stuff a condition into a query to do a long poll response in a temporal-acceptable way or does one need to wire up an action that pushes the update to a redis or etc. pubsub provider?

Currently there’s no built-in way to do this only with Temporal.
There’s a sample I’ve created that uses Redis streams to provide a way to subscribe to query results, you can extend that with HTTP long polling or websocket to relay the updates to your UI.

1 Like

hey Don - i actually wrote that sample and yeah it’s a known problem with that code - i actually do need to wire up the queries so at least you can accomplish polling (not long polling, which i dont think is possible in a serverless context)

basically… can you tell me how you subscribe to your push provider from the frontend? i can try to design to that, or make a new sample. interested in improving this as it is a known weakness

Hi swyx - I think it’s totally ok to just put a comment somewhere in index.tsx that says in a real world application we’d subscribe to workflow updates as shown in the query-subscriptions sample, but for simplicity here we’re just using a timer to mock the update messages we should be getting from the workflow. That lets you keep this sample simple and focused on how to connect a frontend to temporal.

As long as you’re here, I’ll maybe hijack my own thread and ask about the choice to use a Nest.JS api layer as a proxy or relay layer between the frontent client and the temporal server. Super naive question here but is there a reason for not having the frontend client talk directly to the temporal server?

You could use signal/timer, the signal will cancel the workflow, when the signal is processed you cancel the timer.

I’ll just plug this here @Rob_Temporal and I used this technique to provide updates for the “ui” layer:

That synchronous-proxy example definitely helps understand how to use returning from a workflow to trigger UI updates - thanks.

The sequence diagram makes me wonder are there any restrictions on how long temporal will allow a connection like this to stay open? and is there any way to reconnect to the return value of a workflow if the connection drops (say due to networking issues)?

Are you referring to the UpdateOrderWorkflow1.Result() which is “going back” to the UI? That’s essentially this line: samples-go/main.go at 77728cf7c38570898b2c90bf6eb0720c7f5fb30d · temporalio/samples-go · GitHub

Which is just sdk-go/internal_workflow_client.go at 63bd73834e34543f79eb598b2d60a87964424132 · temporalio/sdk-go · GitHub
and the docs say:
// Get will fill the workflow execution result to valuePtr,
// if workflow execution is a success, or return corresponding,
// error. This is a blocking API.

But whether your context expires or network drops, you could still just re-establish connection and grab the result, by calling sdk-go/client.go at 63bd73834e34543f79eb598b2d60a87964424132 · temporalio/sdk-go · GitHub

wr := c.GetWorkflow(ctx, workflowID, "")
wr.Get(ctx, &result)

where “” refers to the latest runID.

@nadilas thanks this is super helpful - brand new to wrapping my head around temporal and benefiting tons from your comments. I’ll probably try to work up a typescript version of this just to make sure I actually understand what’s involved.

Happy to help :slightly_smiling_face:

Of similar note, though maybe not specific to UI, I have just made a PR demonstrating another approach to request/response: Request/response handling sample by cretz · Pull Request #161 · temporalio/samples-go · GitHub. This uses the ability for a long-running workflow to execute activities on another task queue to push information out of the workflow.

1 Like

The idea of using “ephemeral” workers in end-user clients is quite interesting. It makes me want to have [Feature Request] Enable in-browser usage of gRPC client · Issue #207 · temporalio/sdk-typescript · GitHub all the more.

@Chad_Retz just to make sure I understand your choice around the response channel (map[string]chan <- *Response): you are setting it up, because you share a requestor for multiple requests via flag.Args?

Note, I don’t necessarily support ephemeral workers, it just happens to be ephemeral because the CLI is short lived. From the README:

In this particular example, we have abstracted both concepts out into a Requester. This requester can be reused although the sample just shows it used ephemerally as part of the request CLI.

In a daemonized app/service, one might reuse the Requester.

It’s because the Requester is meant to be library code unaware of its caller. The CLI just so happens to use arguments to execute it, but the sample is meant to demonstrate a general-purpose library of request/response.

Thinking about UI updates, do Queries count against the 50,000 event cap per workflow?

A centralized poller like @Chad_Retz shared makes a ton of sense, but as a temporal-newbie I can’t tell whether to be worried about hitting event caps or DB size caps as a result of turning on polling.

Queries are executed out-of-band and therefore not a part of history.

To clarify that example and the README, the query-based poller is the inefficient way to get responses. Callback activities, while a bit more complex, is the more performant way. I just show both side-by-side for comparison. Yes callback activities effect history, but with such a long-running workflow that acts as a request handler, continue-as-new should probably be used anyways.

Are you able to add any more discussion of the callback approach to your readme? As a temporal-newbie coming from other languages trying to read through the go sample, it’s fairly easy to understand the polling but the callback approach is not something I’m confident I’m rocking yet.

The comments in the code are enough to convince me there are non-trivial edge cases I need to worry about with the callback approach, but not enough to give me confidence I understand them well enough to make sure I’m handling them correctly in another implementation in another language.

Sure! If you don’t mind, can you comment on the PR which parts may be unclear so I can expand on them?

Agreed, hence the quotes around my “ephemeral”. And even though they might not be as short-lived as the cli, I’d use a shared client worker, say behind a nextjs app. My “surprise” came from me never considering this before.

You put that much more eloquently than I did, that is what I meant as well.

Awesome example. :+1: