Dynamic rate limiting

I’m designing a web crawler, and one feature I’d like to implement is a per domain rate limit, that’s separate from the existing workflow or activity rate limit and configurable dynamically.

I realise I could deploy a new worker for the domain however as a single worker will be able to make many thousands of outbound HTTP requests, I’m looking for alternatives to this.

In the past I’ve used a throttler via middleware within the HTTP transport to rate limit per-domain, perhaps this could be implemented via WorkflowInterceptors?

Is there an ideal “Temporal” approach here? I think I’m looking for a per-domain task queue rate limit but I’m not sure if it’s possible or how to achieve it using the Go SDK.

Thanks, Jon

Out of the box, Temporal provides rate limiting of a task queue. If a specific activity instance needs to be throttled it should listen on a separate task queue.

I’m not sure if task queue rate-limiting fits your use case. The missing information is how many domains your crawler is expected to process. If the number of domains is bounded to something like 1k then the task queue way is the best one. If you need to process millions of web domains then having millions of activity workers would be overkill.

For a very large number of domains, I would make each activity throttle ifs requests and ensure that there is a limited number of activities per domain through workflow code. I would consider having a workflow execution (instance) per domain in this case.

Hi Maxim, to clarify my requirements, I need to crawl a low (eg below your 1k bound) number of domains, and domains need to be determined at runtime.

Could I use a rate limiter such as GitHub - uber-go/ratelimit: A Golang blocking leaky-bucket rate limit implementation to block in the main per-domain workflow (the spider or crawler), without breaking workflow determinism?

This could ensure that only a maximum of child workflows are started without requiring changes to the deployment of workflow executors and Temporal’s task queue rate limit will ensure that the true maximum number of activities (eg outbound HTTP requests) can’t be exceeded.

Will this cause issues with replay? I’m assuming Temporal’s SDK still calls the activities (returning from the event store data) so the rate limit would apply during replaying.

Could “continue as new” help here? Limiting the impact of replaying to the number of events in that invocation of the workflow?
I’ll need to use “continue as new” to avoid going over the 50k events limit as domains can have many more URLs than 50k.

Thanks, Jon.

I think yes, more reading required.

Could I use a rate limiter such as GitHub - uber-go/ratelimit: A Golang blocking leaky-bucket rate limit implementation to block in the main per-domain workflow (the spider or crawler), without breaking workflow determinism ?

I don’t think it is going to work as such library probably uses system time and native Go gorotine sleep instead of workflow.Sleep.

I need to crawl a low (eg below your 1k bound) number of domains, and domains need to be determined at runtime.

Then I would create a workflow per domain and let it start a worker that polls on the task queue just for that domain from an activity. That worker would use task queue rate limiting. The actual crawl activity invocation can be done from a child workflow that would call continue as new periodically.

Understood, I think I could cheat and use workflow.SideEffect to wrap the calls to the limiter, a crude approach which should not impact replay.

I like this approach, it’s more complex in a Gitops deployment model (which we have), we can automate the Git operations from within a workflow and then polling to check that the new worker is available before continuing.

Thanks for your help as ever :slight_smile:

Understood, I think I could cheat and use workflow.SideEffect to wrap the calls to the limiter, a crude approach which should not impact replay.

Don’t do this as it is going to trip the deadlock detector.

I like this approach, it’s more complex in a Gitops deployment model (which we have), we can automate the Git operations from within a workflow and then polling to check that the new worker is available before continuing.

When I say “start a worker per domain” I don’t mean that you have to start a worker process per domain. You can have many workers in a single process.

Understood.

Interesting, FSR I thought it wasn’t desired to have > 1 worker process per process.

To clarify, do you mean It’s safe to start a Worker instance from within a Workflow, calling wf.Run() with workflow.Context to handle interrupt, with the required activities registered with it at runtime?

To clarify, do you mean It’s safe to start a Worker instance from within a Workflow, calling wf.Run() with workflow.Context to handle interrupt, with the required activities registered with it at runtime?

No, as worker implementation is not “workflow safe”. You wan to start a worker from an activity instead.

Interesting, this is extremely flexible.

My assumptions:

  1. The activity should be called asynchronously?
  2. It’s the responsibility of the workflow that calls the activity to signal to the worker inside the activity to stop (during cancellation or when work is complete)?
  3. The activity should send heartbeats to indicate to Temporal that it is running and not stuck/deadlocked?
  4. That Temporal will handle restarting the worker etc if the activity is rescheduled/redeployed elsewhere?
  1. Temporal Go SDK always calls all activities asynchronously as ExecuteActivity call returns a Future.
  2. I would recommend to use Session feature to route multiple activities to the same process. This way you can have an activity to start a worker and another activity to stop it.
  3. The session itself is already heartbeating internally, so you can get notification if the process is down through the session context.
  4. If you use a session then the workflow has to recreate the session.

Thanks @maxim, that’s all clear and really helpful.

1 Like