-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
Is your feature request related to a problem? Please describe.
Pathway supports data persistence. To work, this feature essentially needs the guarantee that the graph hasn't been changed between the reruns of the program; otherwise, the rerunning program sees a checkpoint from a different graph, tried to parse it, sometimes succeeds, and then fails with an unclear error message.
To avoid spending time on debugging such problems, that may signify the framework bugs, but in reality correspond to an error caused by changing the graph between the two persistent runs, it would be good to calculate a hash/checksum of a Pathway execution graph, so that, at restart, the engine may identify the change in the graph and fail explicitly instead of trying to load data of the a priori incorrect format.
Describe the solution you'd like
Calculated checksum is added into the metadata of the persistent dump and is verified when the program loads it.
Compatibility
To make the change backward-compatible, the checksum can be written in a separate Key-Value pair in the selected backend without changing the format of the existing entries. On the start, if the pair is absent, the engine only calculates the checksum and saves it. In principle, if one wants to override the check (not recommended, and won't be documented), they can just drop the pair - perhaps may be useful in certain tests.