How can we build a web crawler like workflow?

Let’s say we have an activity, which returns some data, based on the data, we may need to create new activities or do nothing

If every activity completes, the whole workflow completes.

sudo code maybe like this:

init queue with activity 1s
while queue is not empty:
          poll activity 1 and execute;
          get data from activity 1;
          if data is not null:
             loop data:
                 create new activity 1;
                 enqueue activity 1;

execute activity 2;
execute activity 3;

Thanks

Which SDK are you using? You can use programming language constructs to check activity results. For your case it seems that all activities are executed sync (you wait for them to complete to get the result).
These examples in Go and Java show simple data-based decisions.
For your case you have to make sure that the insertion order is maintained in the structure you loop over, in order to preserve workflow determinism. Hope this gets you started.

Hi @tihomir, thanks for your prompt reply. I understand how decision based workflow works, but I don’t know if it’s correct to maintain a local queue in the workflow to create new activities based on results of current activities.

I use Go, I plan to use list to behave as queue.

I believe your approach should be fine. Depending on the number of activities that you execute inside your loop, you probably want to make sure you don’t reach the 50K workflow history limit. See more info here.