This is considering PHP implementation, I’m at my wits end here so I’m searching for help anywhere I can think of. If anyone could reproduce this problem on their end it would be really helpful. At least I would know this is not an issue of my specific setup.
A very detailed description of the issue:
Short version:
Sending out a heartbeat in activity method results in heartbeat on non running activity (at random times). The failed state for heartbeats seems to be set upon roadrunner launch ( rr serve command). If the activity worker launches with a ‘failed’ heartbeat state, it’ll continue failing upon Heartbeats until it, or the Workflows rr serve instance is killed and restarted. I’m still not very clear on what fixes it. The more roadrunner instances I run ( rr serve ) the higher the chance of this error.
The failing heartbeats throw: 'activity_pool_get_activity_context: heartbeat on non running activity' on tcp://127.0.0.1:6001
Or sometimes the worker (very rarely) enters a different state for heartbeats and failed with a socket error. IDK if it’s related: socket_send(): unable to write to socket [32]: Broken pipe
<?php
declare(strict_types=1);
namespace App\Command;
use Carbon\CarbonInterval;
use Temporal\Activity\ActivityOptions;
use Temporal\Workflow;
use Temporal\Workflow\WorkflowInterface;
#[WorkflowInterface]
class TemporalTestWorkflow
{
#[Workflow\WorkflowMethod]
public function launchActivity()
{
$options = ActivityOptions::new()
->withStartToCloseTimeout(CarbonInterval::seconds(2))
->withTaskQueue('hyperwallet_command_bus');
Workflow::newUntypedActivityStub($options)->execute('testActivity');
}
}
<?php
declare(strict_types=1);
namespace App;
use Temporal\Activity;
use Temporal\Activity\ActivityInterface;
use Temporal\Activity\ActivityMethod;
#[ActivityInterface]
class TestActivity
{
#[ActivityMethod]
public function testActivity()
{
Activity::heartbeat('test'); // This causes failure.
}
}
Wait, do you not have an activity interface, just the impl?
How are you registering the activity with the worker? This is confusing. Your activity needs an activity interface, see the php samples.
Ok, will try to test with Go and Java sdks and see what happens. Wondering why you would just heart beat once and complete the activity, but I guess it could catch an edge case where activity completion is registered before the heart beat.
Do you have a concrete test with a long-running activity that heart beats, that you use to test cancellations for example, or is this just a quick test to heart beat once?
Regarding Dynamic activities registration at runtime and worker splitting - #2 by maxim
yes your workflow is using untyped stub to invoke the activity, but the worker that listens to your “hyperwallet_command_bus” task queue still has to register that activity. And I believe it needs to have an activity interface.
I can confirm I have the same error on the same roadrunner (workflow and activity on the same service basically) instance and the same queue.
Thanks! Looking forward to that. I think the same queue and same roadrunner instance thing will make the testing easier. Please note u need to restart the roadrunner instance sometimes, the chances of getting the “bad instance” is like 1 out of 3 in my case.
I tried to replicate this issue locally with 10000 concurrent workflows and resetting the activity pool as it goes… and nothing.
I simply can’t get to this issue. Based on a code I see… the only option for this condition to trigger is if the activity pool is being dropped/replaced (somehow) while the activity is pushing its data via RPC.
Can we try to focus on “dropped pipe” and other issues first? I want to make sure there are no other env specific error which might be causing this weird side-effect.
Hello, just as I posted on the original github issue, I cannot reproduce it anymore with 2.6.3 myself. I think it’s fixed!! Thank you so much guys. This goes for both dropped pipe and heartbeat on non running activity. If anyone else is having this issue, update roadrunner to 2.6.3.