Open
Description
An issue to converge on our best conception of the Multiprocessing API before releasing the documentation (+ a new GeoUtils minor version!) describing it for users.
Discussion started in #669 and #661 with @vschaffn and @adebardo.
Some ideas:
- When the output is a raster: Return a
Raster
object (unloaded) that opened theconfig.outfile
at the end of the call? This would allow to easily chain operations, and keep the same syntax as without Multiprocessing!
config1 = MultiprocConfig(chunk_size=200, outfile="reproj.tif")
config2 = MultiprocConfig(chunk_size=200, outfile="prox.tif")
rst = Raster(starting_file)
rst_reproj = rst.reproject(config=config1)
rst_reproj_prox = rst_reproj.proximity(config=config2)
- When the output is not a raster: The object (subsampled array, interpolated array) is expected to fit in memory, so we return it directly?
- Default to a temporary filepath for a call of
MultiprocConfig(chunk_size=200)
withoutoutfile=
, so that users can simply pass the same config argument everywhere defining only chunk size for practicality of chaining operations? (We can probably use Python'stempfile
for this?)
config = MultiprocConfig(chunk_size=200)
rst = Raster(starting_file)
rst_reproj = rst.reproject(config=config)
rst_reproj_prox = rst_reproj.proximity(config=config)
- Add multiprocessing configuration to
geoutils.config
(see https://geoutils.readthedocs.io/en/stable/config.html) so that users can define a global parameter, and don't even have to pass a config argument if they don't want to:
gu.config["mp.chunksizes": (200, 200)]
rst = Raster(starting_file)
rst_reproj = rst.reproject()
rst_reproj_prox = rst_reproj.proximity()
This is for the Raster.function(config=)
API.
I think we can also take notes of our ideas here on the API of the different map_overlap
functions (that will be public) while integrating them for various uses into xDEM and GeoUtils 🙂