Skip to content

Introduce execution graph checksums #142

@zxqfd555

Description

@zxqfd555

Is your feature request related to a problem? Please describe.
Pathway supports data persistence. To work, this feature essentially needs the guarantee that the graph hasn't been changed between the reruns of the program; otherwise, the rerunning program sees a checkpoint from a different graph, tried to parse it, sometimes succeeds, and then fails with an unclear error message.

To avoid spending time on debugging such problems, that may signify the framework bugs, but in reality correspond to an error caused by changing the graph between the two persistent runs, it would be good to calculate a hash/checksum of a Pathway execution graph, so that, at restart, the engine may identify the change in the graph and fail explicitly instead of trying to load data of the a priori incorrect format.

Describe the solution you'd like
Calculated checksum is added into the metadata of the persistent dump and is verified when the program loads it.

Compatibility
To make the change backward-compatible, the checksum can be written in a separate Key-Value pair in the selected backend without changing the format of the existing entries. On the start, if the pair is absent, the engine only calculates the checksum and saves it. In principle, if one wants to override the check (not recommended, and won't be documented), they can just drop the pair - perhaps may be useful in certain tests.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions