Skip to content

[Dask] Race condition in finding ports #5865

@adfea9c0

Description

@adfea9c0

Description

Dask LightGBM will sometimes try to bind to ports that were previously free, but are now used by a different program.

Specifically, it seems that the python code in LGBMDaskRegressor [1] finds open ports, saves the port number and then immediately closes them. After that the C++ layer will try reopening the port [2]. This can go wrong when another program binds to the port between these two steps.

I would say most of my runs succeed but I've ran into the 'LightGBMError: Binding port blah failed' error a handful of times now, and I'm fairly confident the above race condition is the issue.

[1]

def _find_n_open_ports(n: int) -> List[int]:

[2]
if (listener_->Bind(port)) {

Reproducible example

It's kind of hard to reproduce this reliably since it's effectively a race condition. I hope my description of the issue suffices, let me know if I can do more to help.

Environment info

I'm using LightGBM 3.3.2 on Dask.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions