Move `filter.py` from xDEM to GeoUtils and integrate into `raster` class #691

vschaffn · 2025-04-28T16:43:35Z

Resolves #690.

Context

The current filter.py module in xDEM provides a collection of filters that are highly useful for digital elevation models (DEMs). However, these filtering utilities are not intrinsically tied to elevation data—they are generally applicable to any geospatial raster data. Given this, it makes more architectural sense for them to live in GeoUtils.

Changes

Moved the filter.py module from xdem/ to geoutils/filters.py.
Removed gaussian_filter_cv to avoid dependence on opencv.
Moved tests/test_filter.py in xDEM to tests/test_filters.py in GeoUtils.
Added a new function _nan_safe_filter common for gaussian, mean and median filter, to deal with arrays containing nan values.
Added a generic _filter function mapping the available filters.
Added the Raster.filter method that extracts the array with Nans, uses the _filter function, and rebuild the Raster data.

Tests

Added new tests for the Raster.filter method.

Documentation

Added a filter section in raster_class.md.

Note

When this ticket is approved, a linked PR to remove filters in xDEM will be opened.

geoutils/filters.py

rhugonnet · 2025-05-01T16:57:30Z

geoutils/filters.py

+
+    :returns: the filtered array (same shape as input).
+    """
+    return _nan_safe_filter(array, scipy.ndimage.median_filter, **kwargs)


EDIT: If I'm not mistaken, the implementation in xDEM doesn't fully work, and here neither.

In xDEM, replacing by inf:

array = [1, 2, 3] array2 = [1, 2, 3, np.inf, np.inf, np.inf, np.inf] np.median(array) Out[16]: np.float64(2.0) np.median(array2) Out[17]: np.float64(inf)

Here, replacing by 0:

array3 = [1, 2, 3, 0, 0, 0, 0] np.median(array3) Out[19]: np.float64(0.0)

There is no way to ever retrieve 2 with a division of 0 by something else, as is done in _nan_safe_filter.

Should we simply call generic_filter with nanmean and nanmedian inside?

This would be the simplest solution, but from my experience generic_filter is really slow. So we might consider other options in the future. At least, we could compare our implementation to that in the tests.

Yes that's unfortunately the case, see here (46 times slower to use generic_filter for median): https://scipy-user.scipy.narkive.com/YyMHNgYg/scipy-ndimage-filters-median-filter-with-nan-values

The other option is to use Numba as default, which should be fast for any custom operation like this.
This would mirror the behaviour of xDEM in terrain attributes, where I'm adding the option of using either a Numba or Scipy engine. (See here, PR not finalized yet: GlacioHack/xdem#486)

For this PR, we can stick to SciPy only and generic_filter.

There's also a way to write a short C filter to pass to SciPy, which is usually much faster.
See: http://notmatthancock.github.io/2020/04/19/scipy-generic-filter-low-level-callable.html

Actually my bad, for the nanmean we had a working filter already using the trick to replace by the zeros, and then dividing by the sum of valid pixels only, but I don't think this how it behaves exactly in the implementation of this PR.
We actually have an old working duplicate here: https://github.com/GlacioHack/xdem/blob/e1e2065ea52a7db3c8e88045d743aa1f6efd08bb/xdem/spatialstats.py#L2563

In short, to move forward in this PR, we need to remove the _nan_safe_filter() function (as this trick only works for the mean, it's not generalizable), re-implement the mean only with this trick.
And, for the median, simply use generic_filter(np.nanmedian) (which is quite slow, but there are no other choices I think).

rhugonnet

Great addition, thanks a lot for all the work! 😄

It would be great to add more quantitative tests to ensure the results are as expected with/without NaNs. If I didn't make a mistake in my comment above, the mean/median filters might be erroneous right now.

Maybe we should add a set of test manually checking the operation in a given window, for each filter type, to ensure there is no error?

Example for the mean:

arr = np.random.normal((5, 5))

arr_filtered = _filter(arr, method="mean", size=3)
assert np.nanmean(arr) == arr_filtered[2, 2]

They should all be easy to test manually like this, except for the gaussian one, that needs to use the logic in the StackOverflow post to correct the division factors.

My last comment: Should we add the size/radius parameter in Raster.filter()'s parameters directly, to make it more easy to grasp from the user perspective? We also need to specify in the docstring that all other kwargs refer to scipy.ndimage filters 😉.

rhugonnet · 2025-05-07T10:17:31Z

I've been working on a very similar problems during the past week in GlacioHack/xdem#486, so while I still know the intricacies of the topic, here's how we could add Numba filters in GeoUtils to dramatically improve the speed for median (and we could re-use the same code structure for nmad, nanmax, and any other nanfunc):

from typing import Literal
import numpy as np
from numba import jit, prange
from scipy.ndimage import generic_filter

@jit(parallel=True)
def median_filter_numba(dem: np.ndarray, window_size: int):

    # Get input shapes
    N1, N2 = dem.shape

    # Define ranges to loop through given padding
    row_range = N1 - window_size + 1
    col_range = N2 - window_size + 1

    # Allocate output array
    outputs = np.full((row_range, col_range), fill_value=np.nan, dtype=np.float32)

    # Loop over every pixel concurrently by using prange
    for row in prange(row_range):
        for col in prange(col_range):

            outputs[row, col] = np.nanmedian(dem[row: row+window_size, col:col+window_size])

    return outputs


def median_filter(dem: np.ndarray, window_size: int, engine: Literal["scipy", "numba"]):

    # Using SciPy, options already include edge behaviour
    if engine == "scipy":
        return generic_filter(dem, np.nanmedian, size=window_size, mode="constant", cval=np.nan)
    # Using Numba, need to define edge behaviour ourselves
    elif engine == "numba":
        # We pad the input array with NaNs for half the window size
        hw = int((window_size - 1) / 2)
        dem = np.pad(dem, pad_width=((hw, hw), (hw, hw)), constant_values=np.nan)
        return median_filter_numba(dem, window_size)


############
# TEST SPEED
############

from time import time
rnd = np.random.default_rng(42)
nb_it = 10  # Several iterations to be more precise measuring CPU time
window_size = 11
sizes = [50, 100, 500, 1000]  # Array sizes

# Call Numba once in the void to trigger JIT compile
_ = median_filter(np.ones((10, 10)), window_size=3, engine="numba")

list_df = []
for s in sizes:
    dem = rnd.normal(size=(s, s))

    # Get computation times
    t1 = time()
    for i in range(nb_it):
        filt1 = median_filter(dem, window_size=window_size, engine="scipy")
    t2 = time()
    for i in range(nb_it):
        filt2 = median_filter(dem, window_size=window_size, engine="numba")
    t3 = time()

    # Check output is equal
    assert np.allclose(filt1, filt2)

    # Print elapsed times
    print(f"Size: {s}")
    print(f"Elapsed SciPy: {t2 - t1}")
    print(f"Elapsed Numba: {t3 - t2}")

Size: 50
Elapsed SciPy: 0.24158883094787598
Elapsed Numba: 0.005011320114135742
Size: 100
Elapsed SciPy: 0.9490830898284912
Elapsed Numba: 0.012121915817260742
Size: 500
Elapsed SciPy: 23.553598880767822
Elapsed Numba: 0.3076012134552002
Size: 1000
Elapsed SciPy: 94.9611337184906
Elapsed Numba: 1.2290620803833008

Quite a striking improvement, factor of 100! 😉

vschaffn added 4 commits April 25, 2025 17:50

feat: add filters

9b68260

test: add tests for filters

b6ba17f

docs: add doc for filter method

75fe6bc

complete filter docstring

ca372e9

rhugonnet reviewed May 1, 2025

View reviewed changes

geoutils/filters.py Show resolved Hide resolved

rhugonnet reviewed May 1, 2025

View reviewed changes

geoutils/filters.py Show resolved Hide resolved

rhugonnet reviewed May 1, 2025

View reviewed changes

rhugonnet approved these changes May 1, 2025

View reviewed changes

rhugonnet requested changes May 1, 2025

View reviewed changes

rhugonnet mentioned this pull request May 20, 2025

Move filter.py from xDEM to GeoUtils and integrate into raster class #699

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Move `filter.py` from xDEM to GeoUtils and integrate into `raster` class #691

Move `filter.py` from xDEM to GeoUtils and integrate into `raster` class #691

Uh oh!

vschaffn commented Apr 28, 2025 •

edited

Loading

Uh oh!

Uh oh!

rhugonnet May 1, 2025 •

edited

Loading

rhugonnet May 1, 2025

adehecq May 5, 2025

rhugonnet May 5, 2025 •

edited

Loading

rhugonnet May 5, 2025

rhugonnet May 6, 2025

rhugonnet left a comment

rhugonnet commented May 7, 2025 •

edited

Loading

Move filter.py from xDEM to GeoUtils and integrate into raster class #691

Are you sure you want to change the base?

Move filter.py from xDEM to GeoUtils and integrate into raster class #691

Uh oh!

Conversation

vschaffn commented Apr 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Context

Changes

Tests

Documentation

Note

Uh oh!

Uh oh!

rhugonnet May 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

rhugonnet May 1, 2025

Choose a reason for hiding this comment

adehecq May 5, 2025

Choose a reason for hiding this comment

rhugonnet May 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

rhugonnet May 5, 2025

Choose a reason for hiding this comment

rhugonnet May 6, 2025

Choose a reason for hiding this comment

rhugonnet left a comment

Choose a reason for hiding this comment

rhugonnet commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Move `filter.py` from xDEM to GeoUtils and integrate into `raster` class #691

Move `filter.py` from xDEM to GeoUtils and integrate into `raster` class #691

vschaffn commented Apr 28, 2025 •

edited

Loading

rhugonnet May 1, 2025 •

edited

Loading

rhugonnet May 5, 2025 •

edited

Loading

rhugonnet commented May 7, 2025 •

edited

Loading