-
Notifications
You must be signed in to change notification settings - Fork 4k
Description
Description
Dask LightGBM will sometimes try to bind to ports that were previously free, but are now used by a different program.
Specifically, it seems that the python code in LGBMDaskRegressor [1] finds open ports, saves the port number and then immediately closes them. After that the C++ layer will try reopening the port [2]. This can go wrong when another program binds to the port between these two steps.
I would say most of my runs succeed but I've ran into the 'LightGBMError: Binding port blah failed' error a handful of times now, and I'm fairly confident the above race condition is the issue.
[1]
LightGBM/python-package/lightgbm/dask.py
Line 86 in d0dfcee
| def _find_n_open_ports(n: int) -> List[int]: |
[2]
LightGBM/src/network/linkers_socket.cpp
Line 128 in d0dfcee
| if (listener_->Bind(port)) { |
Reproducible example
It's kind of hard to reproduce this reliably since it's effectively a race condition. I hope my description of the issue suffices, let me know if I can do more to help.
Environment info
I'm using LightGBM 3.3.2 on Dask.