Temporal Performance

Hi,

Lately we have been doing PERF tests. The same workflow runs which used to take very short time earlier, now they are taking quite a bit of time.

On further investigation noticed these error logs below.

Tasks are just left with “ActivityTaskScheduled”, I could see from the logs that activity processing starts but then lot of grpc errors (eg: Exception in oDataRead() as shown in log below), ultimately it retries and times out.

Backend kubernetes temporal history pod logs.

in temporal matching pod

So is this because of resource contention w.r.t temporal cluster? I am not so sure, because in kubernetes I could see all the temporal pods running and healthy. No restarts or degradations.

Could you please let me know what might be the issue here, do I have to tune any config?
If not, how to debug and pinpoint what exactly is causing this.

Hi Team,

Any insights here.