-
Notifications
You must be signed in to change notification settings - Fork 24
Move filter.py
from xDEM to GeoUtils and integrate into raster
class
#691
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
||
:returns: the filtered array (same shape as input). | ||
""" | ||
return _nan_safe_filter(array, scipy.ndimage.median_filter, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDIT: If I'm not mistaken, the implementation in xDEM doesn't fully work, and here neither.
In xDEM, replacing by inf
:
array = [1, 2, 3]
array2 = [1, 2, 3, np.inf, np.inf, np.inf, np.inf]
np.median(array)
Out[16]: np.float64(2.0)
np.median(array2)
Out[17]: np.float64(inf)
Here, replacing by 0:
array3 = [1, 2, 3, 0, 0, 0, 0]
np.median(array3)
Out[19]: np.float64(0.0)
There is no way to ever retrieve 2 with a division of 0 by something else, as is done in _nan_safe_filter
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we simply call generic_filter
with nanmean
and nanmedian
inside?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be the simplest solution, but from my experience generic_filter
is really slow. So we might consider other options in the future. At least, we could compare our implementation to that in the tests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes that's unfortunately the case, see here (46 times slower to use generic_filter
for median
): https://scipy-user.scipy.narkive.com/YyMHNgYg/scipy-ndimage-filters-median-filter-with-nan-values
The other option is to use Numba as default, which should be fast for any custom operation like this.
This would mirror the behaviour of xDEM in terrain attributes, where I'm adding the option of using either a Numba or Scipy engine. (See here, PR not finalized yet: GlacioHack/xdem#486)
For this PR, we can stick to SciPy only and generic_filter
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's also a way to write a short C filter to pass to SciPy, which is usually much faster.
See: http://notmatthancock.github.io/2020/04/19/scipy-generic-filter-low-level-callable.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually my bad, for the nanmean
we had a working filter already using the trick to replace by the zeros, and then dividing by the sum of valid pixels only, but I don't think this how it behaves exactly in the implementation of this PR.
We actually have an old working duplicate here: https://github.com/GlacioHack/xdem/blob/e1e2065ea52a7db3c8e88045d743aa1f6efd08bb/xdem/spatialstats.py#L2563
In short, to move forward in this PR, we need to remove the _nan_safe_filter()
function (as this trick only works for the mean, it's not generalizable), re-implement the mean only with this trick.
And, for the median, simply use generic_filter(np.nanmedian)
(which is quite slow, but there are no other choices I think).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great addition, thanks a lot for all the work! 😄
It would be great to add more quantitative tests to ensure the results are as expected with/without NaNs. If I didn't make a mistake in my comment above, the mean/median filters might be erroneous right now.
Maybe we should add a set of test manually checking the operation in a given window, for each filter type, to ensure there is no error?
Example for the mean:
arr = np.random.normal((5, 5))
arr_filtered = _filter(arr, method="mean", size=3)
assert np.nanmean(arr) == arr_filtered[2, 2]
They should all be easy to test manually like this, except for the gaussian
one, that needs to use the logic in the StackOverflow post to correct the division factors.
My last comment: Should we add the size/radius
parameter in Raster.filter()
's parameters directly, to make it more easy to grasp from the user perspective? We also need to specify in the docstring that all other kwargs refer to scipy.ndimage
filters 😉.
I've been working on a very similar problems during the past week in GlacioHack/xdem#486, so while I still know the intricacies of the topic, here's how we could add Numba filters in GeoUtils to dramatically improve the speed for from typing import Literal
import numpy as np
from numba import jit, prange
from scipy.ndimage import generic_filter
@jit(parallel=True)
def median_filter_numba(dem: np.ndarray, window_size: int):
# Get input shapes
N1, N2 = dem.shape
# Define ranges to loop through given padding
row_range = N1 - window_size + 1
col_range = N2 - window_size + 1
# Allocate output array
outputs = np.full((row_range, col_range), fill_value=np.nan, dtype=np.float32)
# Loop over every pixel concurrently by using prange
for row in prange(row_range):
for col in prange(col_range):
outputs[row, col] = np.nanmedian(dem[row: row+window_size, col:col+window_size])
return outputs
def median_filter(dem: np.ndarray, window_size: int, engine: Literal["scipy", "numba"]):
# Using SciPy, options already include edge behaviour
if engine == "scipy":
return generic_filter(dem, np.nanmedian, size=window_size, mode="constant", cval=np.nan)
# Using Numba, need to define edge behaviour ourselves
elif engine == "numba":
# We pad the input array with NaNs for half the window size
hw = int((window_size - 1) / 2)
dem = np.pad(dem, pad_width=((hw, hw), (hw, hw)), constant_values=np.nan)
return median_filter_numba(dem, window_size)
############
# TEST SPEED
############
from time import time
rnd = np.random.default_rng(42)
nb_it = 10 # Several iterations to be more precise measuring CPU time
window_size = 11
sizes = [50, 100, 500, 1000] # Array sizes
# Call Numba once in the void to trigger JIT compile
_ = median_filter(np.ones((10, 10)), window_size=3, engine="numba")
list_df = []
for s in sizes:
dem = rnd.normal(size=(s, s))
# Get computation times
t1 = time()
for i in range(nb_it):
filt1 = median_filter(dem, window_size=window_size, engine="scipy")
t2 = time()
for i in range(nb_it):
filt2 = median_filter(dem, window_size=window_size, engine="numba")
t3 = time()
# Check output is equal
assert np.allclose(filt1, filt2)
# Print elapsed times
print(f"Size: {s}")
print(f"Elapsed SciPy: {t2 - t1}")
print(f"Elapsed Numba: {t3 - t2}")
Quite a striking improvement, factor of 100! 😉 |
Resolves #690.
Context
The current
filter.py
module in xDEM provides a collection of filters that are highly useful for digital elevation models (DEMs). However, these filtering utilities are not intrinsically tied to elevation data—they are generally applicable to any geospatial raster data. Given this, it makes more architectural sense for them to live in GeoUtils.Changes
filter.py
module fromxdem/
togeoutils/filters.py
.gaussian_filter_cv
to avoid dependence on opencv.tests/test_filter.py
in xDEM totests/test_filters.py
in GeoUtils._nan_safe_filter
common for gaussian, mean and median filter, to deal with arrays containing nan values._filter
function mapping the available filters.Raster.filter
method that extracts the array with Nans, uses the_filter
function, and rebuild the Raster data.Tests
Raster.filter
method.Documentation
filter
section inraster_class.md
.Note
When this ticket is approved, a linked PR to remove filters in xDEM will be opened.