Skip to content

Advanced Interval Analysis #14515

Open
Open
@ozankabak

Description

@ozankabak

Is your feature request related to a problem or challenge?

The interval arithmetic library in DataFusion works via two fundamental methods: evaluate_bounds for bottom-up evaluation, and propagate_constraints for top-down propagation. We apply each traversal once to update bounds on columns. However, when there are complex expressions where symbols (e.g. columns) appear more than once, this will give overly pessimistic bounds, resulting in missed opportunities in data pruning and optimizations.

Describe the solution you'd like

If we use Timothy Hickey's interval narrowing approach and utilize a single update_bounds API (instead of the two APIs we have today), we can utilize a queue of nodes to only narrow intervals of nodes that keep shrinking. Simply stated, the update_bounds function is a simultaneous computation of both evaluation and propagation logics -- it updates intervals of the parent and the children nodes at the same time.

This will allow us to arrive at more precise bounds without unnecessarily traversing the expression DAG too many times.

Describe alternatives you've considered

We can apply evaluate_bounds and propagate_constraints in a loop until column bounds converge to a fixed point (to some tolerance), but that would be inefficient (albeit easy).

Additional context

Efficient Implementation of Interval Arithmetic Narrowing Using IEEE Arithmetic, Timothy J. Hickey and Qun Ju

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions