The first expected bottleneck is a single task queue throughput. Make sure that the task queues used by your example have enough partitions as they are not autoscaled yet. These are configured through dynamic config.
Also Java client might need number of polling threads adjusted to increase throughput. For workflow task list adjust WorkerFactoryOptions.workflowHostLocalPollThreadCount. And for activity task list adjust WorkerOptions.activityPollThreadCount.