Hello All,
I am trying to play around activity Heatbeat feature
I am facing an issue with
activity.RecordHeartbeat()
In this go-example
I did some additional changes
changed activity option to:
ao := workflow.ActivityOptions{
ScheduleToStartTimeout: time.Minute,
StartToCloseTimeout: time.Minute * 30,
HeartbeatTimeout: 30 * time.Second,
WaitForCancellation: true,
RetryPolicy: &temporal.RetryPolicy{InitialInterval: 1 * time.Second},
}
and I Replaced the “ActivityToBeCanceled” (added some logging related to heartBeat)
func (a *Activities) ActivityToBeCanceled(ctx context.Context) (string, error) {
logger := activity.GetLogger(ctx)
logger.Info("activity started, to cancel workflow, use 'go run cancelactivity/cancel/main.go " +
"-w <WorkflowID>' or CLI: 'tctl wf cancel -w <WorkflowID>' to cancel")
a.test++
heartBeatTime := time.Time{}
for {
select {
case <-time.After(1 * time.Second):
seq := fmt.Sprintf("heat beat det %v", time.Now())
activity.RecordHeartbeat(ctx, seq)
if time.Now().After(heartBeatTime.Add(10 * time.Second)) {
fmt.Printf("HeartBeating....%v: \n", time.Now())
heartBeatTime = time.Now()
if activity.HasHeartbeatDetails(ctx) {
var prev string
if err := activity.GetHeartbeatDetails(ctx, &prev); err != nil {
clog.Error(ctx, "GetHeartbeatDetails fail", err)
}
fmt.Printf("value of heart beat det : %v \n", prev)
} else {
fmt.Printf("has heartbeat false %v:\n ", time.Now())
}
}
case <-ctx.Done():
logger.Info("context is cancelled")
return "I am canceled by Done", nil
}
}
}
when I am running this I am getting logs
1 2020-08-29T13:30:20.574+0530 INFO log/with_logger.go:60 activity started, to cancel workflow,
2 HeartBeating....: 2020-08-29 13:30:21.593334 +0530 IST m=+5.168517102
3 has heartbeat false 2020-08-29 13:30:21.593361 +0530 IST m=+5.168544344:
4 HeartBeating....: 2020-08-29 13:30:31.620144 +0530 IST m=+15.195262496
5 has heartbeat false 2020-08-29 13:30:31.620178 +0530 IST m=+15.195296394:
6 HeartBeating....: 2020-08-29 13:30:41.638978 +0530 IST m=+25.214031810
7 has heartbeat false 2020-08-29 13:30:41.639046 +0530 IST m=+25.214100403:
8 HeartBeating....: 2020-08-29 13:30:51.664958 +0530 IST m=+35.239948039
9 has heartbeat false 2020-08-29 13:30:51.664988 +0530 IST m=+35.239977791:
10 2020-08-29T13:30:52.605+0530 INFO log/with_logger.go:60 activity started, to cancel workflow,
11 HeartBeating....: 2020-08-29 13:30:53.617806 +0530 IST m=+37.192782776
12 value of heart beat det: 2020-08-29 13:30:21.579813 +0530 IST m=+5.154996146
HeartBeating....: 2020-08-29 13:31:01.692941 +0530 IST m=+45.267865867
has heartbeat false 2020-08-29 13:31:01.692974 +0530 IST m=+45.267899475:
HeartBeating....: 2020-08-29 13:31:03.64545 +0530 IST m=+47.220362588
value of heart beat det: 2020-08-29 13:30:21.579813 +0530 IST m=+5.154996146
HeartBeating....: 2020-08-29 13:31:11.717037 +0530 IST m=+55.291897364
has heartbeat false 2020-08-29 13:31:11.717057 +0530 IST m=+55.291917021:
HeartBeating....: 2020-08-29 13:31:13.679608 +0530 IST m=+57.254456094
value of heart beat det: 2020-08-29 13:30:21.579813 +0530 IST m=+5.154996146
HeartBeating....: 2020-08-29 13:31:21.746301 +0530 IST m=+65.321096978
has heartbeat false 2020-08-29 13:31:21.746335 +0530 IST m=+65.321130773:
HeartBeating....: 2020-08-29 13:31:23.703096 +0530 IST m=+67.277879083
value of heart beat det: 2020-08-29 13:30:21.579813 +0530 IST m=+5.154996146
2020-08-29T13:31:25.628+0530 INFO log/with_logger.go:60 activity started, to cancel workflow,
HeartBeating....: 2020-08-29 13:31:26.642248 +0530 IST m=+70.217012505
value of heart beat det: 2020-08-29 13:30:53.606229 +0530 IST m=+37.181206030
HeartBeating....: 2020-08-29 13:31:31.774526 +0530 IST m=+75.349257482
has heartbeat false 2020-08-29 13:31:31.774553 +0530 IST m=+75.349283756:
HeartBeating....: 2020-08-29 13:31:33.723904 +0530 IST m=+77.298622057
value of heart beat det: 2020-08-29 13:30:21.579813 +0530 IST m=+5.154996146
I suspect only the first heartbeat is taking into consideration ( the one that fires when activity start , means it behaves like the heartbeat is not working ) we can see activity retrying because of heartbeat timeout, even in the next retry we could see the heartbeat (the argument (the passed value as details) is the last activity start time, not the last heartbeat time)
line number “12” in the logs prints time
"020-08-29 13:30:21.579813" ( the first try start time)
but it should be near to
"2020-08-29 13:30:51.66495". (the first try end time )
?
what is going on here, is it expected behavior?
I needed to run a long activity that should retry immediately when server worker restarts ( so I need to add retry option, but this behavior blocking me
any help would be really helpful
thanks in advance,
Junaid