Incorrect task slots getting created in K8S multi pod deployment

poojabhutada · July 15, 2022, 3:08pm

Hi,

We have a K8S deployment where temporal server(Front end - 3 pods, 4 cores 4GB each ; History - 5 pods, 8 cores 8GB each; Matching - 3 pods, 4 cores 4GB each, 4K shards, matching queue partitions at 4) as well as our microservices(hosting workers) deployed in same cluster. Have 12 pods of our service and observed that temporal_worker_task_slots_available get emitted only from few pods out of those 12. And especially this behavior is intermittent and at times when it is not being emitted from any of the pods, still workflows run successfully. We have a code similar to GitHub - applicaai/spring-boot-starter-temporal: The driver making it convenient to use Temporal with Spring Boot. to initialize the workers and all the startup code via a library. Below are my queries :

Has anyone faced similar issue where slots are being generated intermittently or any similar issue faced when using the above mentioned starter project ?
Even when temporal_worker_task_slots_available was not being emitted from a single pod, how is it possible that workflows are running successfully without any errors?

We already asked this kind of query as part of another post(Seeing high latencies between two subsequent activity task executions - #15 by Yimin_Chen), but did not receive an answer related to this specific question, hence asking it in a separate thread.

Please guide.

tihomir · July 15, 2022, 11:35pm

observed that temporal_worker_task_slots_available get emitted only from few pods

One maybe obvious thing to check is if all your pods are running on java sdk 1.8.0 or greater.
Another is if you are checking with Grafana to see if you don’t get results for none of the labels:
`worker_type=“ActivityWorker”

`worker_type=“LocalActivityWorker”

worker_type=“WorkflowWorker”

still workflows run successfully.

For the workflows that are completing, can you check the “identity” value of their first WorkflowTaskStarted event in history and see if its the identity of any of the workers in pods that are not emitting this metric?
You can set custom identity via WorkflowClientOptions if that makes it easier.

poojabhutada · July 16, 2022, 4:33am

@tihomir ok, will check on the identity part and confirm back on Monday.
Another thing to mention is that when worker initialization is done via spring-boot-starter-temporal/src/main/java/ai/applica/spring/boot at master · applicaai/spring-boot-starter-temporal · GitHub then we see this kind of inconsistent worker creation when we scale up the worker pods in K8S setup. If we do not use the above code and move the worker initialization in our spring boot application without the above annotations, then it seems to be working perfectly fine.

Below are the files doing the annotation processing and creating worker threads :

github.com

applicaai/spring-boot-starter-temporal/blob/master/src/main/java/ai/applica/spring/boot/starter/temporal/processors/WorkflowAnnotationBeanPostProcessor.java

/*
 *  Copyright (c) 2020 Applica.ai All Rights Reserved
 *
 *  Copyright 2012-2016 Amazon.com, Inc. or its affiliates. All Rights Reserved.
 *
 *  Licensed under the Apache License, Version 2.0 (the "License"). You may not
 *  use this file except in compliance with the License. A copy of the License is
 *  located at
 *
 *  http://aws.amazon.com/apache2.0
 *
 *  or in the "license" file accompanying this file. This file is distributed on
 *  an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
 *  express or implied. See the License for the specific language governing
 *  permissions and limitations under the License.
 */

package ai.applica.spring.boot.starter.temporal.processors;

import ai.applica.spring.boot.starter.temporal.WorkflowFactory;

This file has been truncated. show original

github.com

applicaai/spring-boot-starter-temporal/blob/master/src/main/java/ai/applica/spring/boot/starter/temporal/processors/ActivityAnnotationBeanPostProcessor.java

/*
 *  Copyright (c) 2020 Applica.ai All Rights Reserved
 *
 *  Copyright 2012-2016 Amazon.com, Inc. or its affiliates. All Rights Reserved.
 *
 *  Licensed under the Apache License, Version 2.0 (the "License"). You may not
 *  use this file except in compliance with the License. A copy of the License is
 *  located at
 *
 *  http://aws.amazon.com/apache2.0
 *
 *  or in the "license" file accompanying this file. This file is distributed on
 *  an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
 *  express or implied. See the License for the specific language governing
 *  permissions and limitations under the License.
 */

package ai.applica.spring.boot.starter.temporal.processors;

import ai.applica.spring.boot.starter.temporal.annotations.TemporalActivity;

This file has been truncated. show original

github.com

applicaai/spring-boot-starter-temporal/blob/master/src/main/java/ai/applica/spring/boot/starter/temporal/processors/ActivityStubInterceptor.java

/*
 *  Copyright (c) 2020 Applica.ai All Rights Reserved
 *
 *  Copyright 2012-2016 Amazon.com, Inc. or its affiliates. All Rights Reserved.
 *
 *  Licensed under the Apache License, Version 2.0 (the "License"). You may not
 *  use this file except in compliance with the License. A copy of the License is
 *  located at
 *
 *  http://aws.amazon.com/apache2.0
 *
 *  or in the "license" file accompanying this file. This file is distributed on
 *  an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either
 *  express or implied. See the License for the specific language governing
 *  permissions and limitations under the License.
 */

package ai.applica.spring.boot.starter.temporal.processors;

import ai.applica.spring.boot.starter.temporal.InvalidOptionsModifierArgumentException;

This file has been truncated. show original

Do you see any problem in above code of worker creation? It would be really helpful if you can guide on the same…

poojabhutada · July 19, 2022, 1:30am

It would be great if someone sheds light on this area…

tihomir · July 19, 2022, 2:42am

then we see this kind of inconsistent worker creation when we scale up the worker pods in K8S setup

We cannot provide support for the mentioned spring boot starter as we don’t control the code and cannot fix possible issues/bugs. Best maybe to ask them as this seems not Temporal related issues.

We are working on spring boot integration within the Temporal SDK, see here. This will be evolved in the future.

If we do not use the above code and move the worker initialization in our spring boot application without the above annotations, then it seems to be working perfectly fine.

Ok it sounds like this is the best way to go then to bypass possible issues / bugs with the mentioned library/starter.

poojabhutada · July 19, 2022, 3:01am

Thanks @tihomir for the clarification and will keep any eye on the springboot support for temporal.
Also, when i checked on the identity part, can see no. of pollers in web ui equals to the no. of worker pods, but somehow the slots are not being emitted by all those pods in prometheus.

Topic		Replies	Views
Restarted and New Pods Not Picking Up Old Workflows from Task Queue in K8s Cluster Community Support general-impl , typescript-sdk	11	805	September 5, 2023
Unbalanced work between temporal worker Server Deployment	1	320	November 27, 2023
Multiple Worker instances and polling Community Support java-sdk	7	2399	November 24, 2021
Throughput and scaling of temporal workers Server Deployment go-sdk	0	64	December 17, 2024
Temporal Performance Community Support java-sdk	1	289	January 31, 2024

Incorrect task slots getting created in K8S multi pod deployment

Related topics