Bug: workflow_sandbox imports and re-initizates extension modules

Hello,

Python 3.11.6

cryptography==41.0.7 installed using pip
cffi==1.16.0
pip 23.3.1

I am trying to use the fernet lib in a temporal worker, and keep getting the following issue due to PyO3 modules being instantiated twice. it seems that temporal is attempting to import and re-initization extension modules

stacktrace

> File "/crypto_utils.py", line 28, in _decrypt
> return fernet.decrypt(to_decrypt).decode()
> ^^^^^^^^^^^^^^^^^^^^^^^^^^
> File "/lib/python3.11/site-packages/cryptography/fernet.py", line 91, in decrypt
> return self._decrypt_data(data, timestamp, time_info)
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> File "/lib/python3.11/site-packages/cryptography/fernet.py", line 152, in _decrypt_data
> self._verify_signature(data)
> File "/lib/python3.11/site-packages/cryptography/fernet.py", line 131, in _verify_signature
> h = HMAC(self._signing_key, hashes.SHA256())
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> File "/lib/python3.11/site-packages/cryptography/utils.py", line 48, in _extract_buffer_length
> from cryptography.hazmat.bindings._rust import _openssl
> File "/lib/python3.11/site-packages/temporalio/worker/workflow_sandbox/_importer.py", line 441, in **call**
> return self.current(*args, **kwargs)
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> File "/lib/python3.11/site-packages/temporalio/worker/workflow_sandbox/_importer.py", line 234, in _import
> mod = importlib.**import**(name, globals, locals, fromlist, level)
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> File "", line 1283, in **import**
> File "", line 1204, in _gcd_import
> File "", line 1176, in _find_and_load
> File "", line 1147, in _find_and_load_unlocked
> File "", line 676, in _load_unlocked
> File "", line 573, in module_from_spec
> File "", line 1233, in create_module
> File "", line 241, in _call_with_frames_removed
> ImportError: PyO3 modules may only be initialized once per interpreter process

It is by intention that workflows re-import modules.

Are you using this fernet library in a workflow? Workflow files are re-imported for every run for sandbox reasons (see docs and README). You should move the workflow to a separate file that is not using this library. If you are using this library in a workflow, you can mark it as a passthrough module if you know you are using it deterministically (see docs and README).

Hey Chad,

Really appreciate your help here. I am using the newest version of the cryptography.fernet library in a custom data converter so I can support encryption (I have sensitive data that cannot show up in the temporal UI).

I am passing through the cryptography library in the data-converter using this code. The SensitiveBaseModel includes the fernet library used for encryption/decryption

with workflow.unsafe.imports_passed_through():
    from src.crypto_utils import SensitiveBaseModel

class PydanticJSONPayloadConverter(JSONPlainPayloadConverter):
    """Pydantic JSON payload converter.

    This extends the :py:class:`JSONPlainPayloadConverter` to override
    :py:meth:`to_payload` using the Pydantic encoder.
    """

    def to_payload(self, value: Any) -> Optional[Payload]:
        if isinstance(value, SensitiveBaseModel):
            value.encrypt()

        return Payload(
            metadata={"encoding": self.encoding.encode()},
            data=json.dumps(
                value, separators=(",", ":"), sort_keys=True, default=to_jsonable_python
            ).encode(),
        )

    def from_payload(
        self,
        payload: Payload,
        type_hint: Optional[Type] = None,
    ) -> Any:
        """See base class."""
        try:
            obj = json.loads(payload.data, cls=self._decoder)
            if type_hint:
                obj = value_to_type(type_hint, obj, self._custom_type_converters)

            if isinstance(obj, SensitiveBaseModel):
                obj.decrypt()

            return obj
        except json.JSONDecodeError as err:
            raise RuntimeError(str(err))

My understanding is that the data converters get instantiated in every workflow.

This seems to be a deeper issue with imports that rely on rust extension modules. As more core python libraries switch to using rust bindings, this could become a big issue.

Any advice here? I am happy to give more details or information if needed

For some reason this is not seen as passed through if it’s being re-imported. The goal of pass through is to only import once via the system importer.

This is correct (for payload converters, not the entire data converter)

Yes, but passing through the import is meant to alleviate this

I will set aside some time to investigate. This requires debugging why the sandbox is re-importing something that is being passed through. A standalone replication may help, but probably not needed.

Thanks for the help. The following code would likely be a simple way to reproduce the issue in a workflow, unless this issue is specific to data-converter instantiation.

with workflow.unsafe.imports_passed_through():
    from cryptography.fernet import Fernet

key = Fernet.generate_key()
f = Fernet(key)
token = f.encrypt(b"my deep dark secret")
f.decrypt(token)

Using the encryption sample as a base, I was not able to replicate with 38.x version of the cryptography lib, but when I updated to 41.x I was able to replicate.

This is happening because that cryptography library does a runtime import here and here. Libraries that do imports during the run of the workflow are using the sandboxed importer by intention. with workflow.unsafe.imports_passed_through(): only applies to the code that runs inside it, not code that runs later.

Options in order of most preferable:

  • Do not put encryption stuff in a deterministic payload converter, that is not where it goes, put it in a payload codec. See the aforementioned encryption sample. If you need partial encryption, you can have a payload converter do whatever is needed to make it clear to the payload codec (and vice-versa) what needs to be encrypted/decrypted (or you can instead use a local activity to decrypt the single part inside the workflow if you’d like, but be advised that result is stored in history, many users don’t decrypt the thing they need until they are in an activity).
  • Run your decryption code inside with workflow.unsafe.imports_passed_through(): since it’s doing runtime imports
  • As seen in the README, you can mark the entire cryptography library as passthrough at the worker level, e.g. when creating the worker: workflow_runner=temporalio.worker.workflow_sandbox.SandboxedWorkflowRunner(restrictions=temporalio.worker.workflow_sandbox.SandboxRestrictions.default.with_passthrough_modules("cryptography"))
  • Add import cryptography.hazmat.backends.openssl.backend and import cryptography.hazmat.bindings._rust in your with workflow.unsafe.imports_passed_through()
  • Disable the sandbox

Excellent, thank you so much for the help. Your advice worked for me.