If it is related to the fix (so child workflow initiated, started, and completed in the same workflow task), I cannot confirm that. In fact, from my understanding of the underlying algorithm involved, this has been an issue since .NET SDK’s inception, though would have to test on older versions to be sure. For most this is a rare situation that a child workflow completes so fast and/or the parent workflow’s task processor is not fast enough where the entire child workflow’s lifecycle is in one workflow task, so that may be why you were not seeing it before.
It is technically possible though I see nothing obvious. Some questions to help us figure this out:
Is this a regular occurrence? Is it possible to even reliably replicate? If so, is it possible to reduce the replication down to something simple enough where we could replicate on our side? Understood if replication may involve a loop or racy situations.
Does the exact same history fail in latest but pass in 1.4.0? The way you would test this is take a workflow history from a 1.4.0 workflow that succeeded that you believe would fail in 1.7.0. Then run it through the WorkflowReplayer in 1.4.0, confirm it does not fail, and 1.7.0 and confirm it does fail. If the exact same history fails in 1.7.0 but passes in 1.4.0 then there definitely was some change. Note, you can’t do the inverse (e.g. grab history in 1.7.0 and run it in 1.4.0) as history is mostly only forward compatible.
Can you provide a history dump of the failing workflow history? We just want to confirm whether it is the same start-and-complete-same-task issue. This can be provided via a ticket if you are a cloud user, or if not, can be provided via DM to me, cretz, in our public Slack or via email attachment to me w/ my address being first name at temporal.io