-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
Steps to reproduce
Currently, window functions operating on append-only collections can consume unbounded memory.
For example, one can refer the Discord discussion related to exporting data into Victoria Metrics. This happens despite user-defined UDFs being explicitly marked as deterministic.
The root cause is that window functions internally rely on Python UDFs that are not marked as deterministic. As a result, the pipeline treats parts of the computation as non-deterministic and retains additional state to allow for potential recomputation.
Because of this, memory usage is not optimized as well as it could be: the system keeps historical data that is, in fact, unnecessary.
To fix this, the UDFs used inside window functions should be:
- extracted from inline lambdas into dedicated implementations, and
- explicitly marked as
deterministic.
This would allow the pipeline to safely apply more aggressive memory optimizations.
Relevant log output
There are no specific error logs.
However, when running the user-provided code with a PythonConnector and logging memory usage, the memory footprint grows indefinitely. This is not the desired behavior: memory usage should remain stable over time.
What did you expect to happen?
The process should have stable memory consumption when using window functions on AppendOnly collections.
Version
0.27.1
Docker Versions (if used)
No response
OS
Linux
On which CPU architecture did you run Pathway?
None