Skip to content

[Bug]: Window functions in Pathway must internally use deterministic UDFs #148

@zxqfd555

Description

@zxqfd555

Steps to reproduce

Currently, window functions operating on append-only collections can consume unbounded memory.

For example, one can refer the Discord discussion related to exporting data into Victoria Metrics. This happens despite user-defined UDFs being explicitly marked as deterministic.

The root cause is that window functions internally rely on Python UDFs that are not marked as deterministic. As a result, the pipeline treats parts of the computation as non-deterministic and retains additional state to allow for potential recomputation.

Because of this, memory usage is not optimized as well as it could be: the system keeps historical data that is, in fact, unnecessary.

To fix this, the UDFs used inside window functions should be:

  • extracted from inline lambdas into dedicated implementations, and
  • explicitly marked as deterministic.

This would allow the pipeline to safely apply more aggressive memory optimizations.

Relevant log output

There are no specific error logs.

However, when running the user-provided code with a PythonConnector and logging memory usage, the memory footprint grows indefinitely. This is not the desired behavior: memory usage should remain stable over time.

What did you expect to happen?

The process should have stable memory consumption when using window functions on AppendOnly collections.

Version

0.27.1

Docker Versions (if used)

No response

OS

Linux

On which CPU architecture did you run Pathway?

None

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions