disable npp in multistream context #3338

awarebayes · 2022-08-24T17:59:21Z

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

[* ] I agree to contribute to the project under Apache 2 License.
[* ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
[* ] The PR is proposed to the proper branch
[* ] There is a reference to the original bug report and related work
[*] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
[*] The feature is well documented and sample code can be built with the project CMake

This PR is to match #3330

The issue is simple to reproduce a code from this popular tutorial can be used, just replace cuda::resize with cuda::warpAffine and some random matrix to accompany.

To my best knowledge is does not make sense to use npp with multistream context and simple warp implementation works just fine.

You can see using nppSetStream uses a lot of ioctl as was referenced in the issue and thus this function should only be used with the default context.

Alternatively there can be a static compilation warning in the NppWarp class, parametrized for calling it with StreamNull using static dispatch. I can probably try pulling that off, it may result in messier code, but the warning will be compile time.

cudawarped · 2022-08-26T09:44:28Z

@awarebayes Whilst I am no fan of NPP, I believe that they should offer the best performance going forward on newer architectures than the OpenCV hand rolled code which was optimized for hardware from 10 years ago. Now if all the OpenCV CUDA routines could be updated for new hardware and CUDA SDK features when they are released then I would change my mind but OpenCV is open source and the CUDA modules are in the contrib repository so this is not realistic.

With that out of the way would it not be better to replace the existing global stream versions (e.g. nppiWarpAffine_8u_C1R )
with the newer ones which accept and NppStreamContext (e.g. nppiWarpAffine_8u_C1R_ctx )? I ran a quick test on a 2000x2000 matrix using these changes and the performance of the npp and OpenCV hand rolled version are about the same. I think these changes and a #define for the CUDA version will be sufficient. Then if that works all of the npp routines could be updated to make propper use of streams.

A futher improvement might be to have an additional flag which lets the user decide if they want to use the NPP routines or not.

awarebayes · 2022-08-26T13:17:05Z

@cudawarped Should I be the one to do it? I.E. replace with the contexted version

cudawarped · 2022-08-26T14:53:43Z

@awarebayes Its your pull request, I'm mearly suggesting an alternative approach which in my opinion may be better, ultimately its up to the maintainers to decide. If you do apply the suggested changes make sure to include the #define to only update for CUDA toolkit >= 10.1 see the release notes .

First I would check that the changes I suggested are as quick as your change for your specific problem.

awarebayes · 2022-08-27T14:26:38Z

I did a compile time static dispatch for either _Ctx or non-_Ctx methods.

Tested, it performs not as good as opencv's because it uses cv::cuda::Mat::setTo to fill image with border color.

As for the details of the implemenation:
I basically check ,whether the library version supports _Ctx functions in a #define canUseContext
If it can, I concatenate _Ctx to the given function names in the function table. It it can't, I concatenate nothing to the names.
Then I need to issue a specific call to two different versions. I use static dispatch and Alexandrescu's Type2Var idiom with class templates to select the required version to call during the compile time.

So everything is static and should not affect the performance

awarebayes · 2022-08-27T14:28:22Z

You may not like me using concatenation macros, it can be avoided

cudawarped · 2022-09-13T08:46:11Z

modules/cudawarping/src/warp.cpp

@@ -193,7 +260,8 @@ void cv::cuda::warpAffine(InputArray _src, OutputArray _dst, InputArray _M, Size
    CV_Assert( interpolation == INTER_NEAREST || interpolation == INTER_LINEAR || interpolation == INTER_CUBIC );
    CV_Assert( borderMode == BORDER_REFLECT101 || borderMode == BORDER_REPLICATE || borderMode == BORDER_CONSTANT || borderMode == BORDER_REFLECT || borderMode == BORDER_WRAP );

-    _dst.create(dsize, src.type());
+    if (_dst.size() != dsize)
+        _dst.create(dsize, src.type());


I think this is redundant as void cv::cuda::GpuMat::create(int _rows, int _cols, int _type) should automatically handle this for you in addition to checking the type.

far as I can remember, I checked the profiler and maybe there was some overhead? I am not sure...

cudawarped · 2022-09-13T12:13:07Z

modules/cudawarping/src/warp.cpp

+            call_impl(src, dst, coeffs, interpolation, stream, Int2Type<canUseContext>());
+        }
+
+        template <int I>


Can we use cudev::Int2Type?

cudawarped · 2022-09-13T12:22:59Z

@alalek I'm not great with templates is there a better way of implementing this change with them? I'm asking because if this change is suitable it could probably be applied to all the CUDA NPP routines.

Just out of interest how long does support need to be offered to older CUDA releases? NppStreamContext (_ctx) was added 3 years ago so if support only has to be for 3 years then the routines could be updated to just use the newer NppStreamContext versions instead of both?

cudawarped · 2022-11-30T14:12:01Z

@awarebayes where did you get to with this?

I finally had chance to take a look and I'm not sure if it makes sense to use templates in this case. I may be wrong but the code we are trying to template is not another type but historic/unsafe functions, which should probably be removed in future, and are only included for users who are unable to upgrade to the lates version of CUDA.

I think something similar to

opencv_contrib/modules/cudawarping/src/warp.cpp

Line 152 in ec9d7bc

#if USE_NPP_STREAM_CONTEXT

would suffice, what do you think?

awarebayes · 2022-12-06T09:39:22Z

I am pro removing old code and just going along with the new version.

Ifdefs can solve anything but those are just macros and do not provide static dispatch

awarebayes · 2022-12-06T09:40:12Z

There is an obvious problem for most people, using new versions of NPP, if it breaks older code, welp :(

cudawarped · 2022-12-06T10:17:57Z

Ifdefs can solve anything but those are just macros and do not provide static dispatch

Can you explain futher what you mean? Both the templated and the #define version have one code path determined at compile time based on the version of NPP being used. There is currently no flag to choose the older unsafe version.

There is an obvious problem for most people, using new versions of NPP, if it breaks older code, welp :(

This could be an issue but then isn't that as an issue a user may/can/does unfortunately face every time they change the version of the CUDA SDK they are using?

asmorkalov · 2024-10-22T11:05:28Z

modules/cudawarping/src/warp.cpp

@@ -45,8 +45,19 @@
 using namespace cv;
 using namespace cv::cuda;

+#define canUseContext NPP_VERSION >= (10 * 1000 + 1 * 100 + 0)


please user uppercase for macros, e.g. NPP_CONTEXT_AVAILABLE

disable npp in multistream context

237b765

awarebayes added 2 commits August 27, 2022 17:17

Use static dispatch for context

8be686d

Merge branch 'opencv:4.x' into patch-1

b75dfdd

asmorkalov added the category: cuda label Aug 29, 2022

cudawarped reviewed Sep 13, 2022

View reviewed changes

cudawarped mentioned this pull request Nov 10, 2022

More data types supported in cv::cuda::transpose() #3371

Open

6 tasks

This was referenced Oct 3, 2024

cuda - update npp calls to use the new NppStreamContext API if available opencv/opencv#26245

Merged

cuda - update npp calls to use the new NppStreamContext API if available #3803

Merged

asmorkalov reviewed Oct 22, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

disable npp in multistream context #3338

disable npp in multistream context #3338

Uh oh!

awarebayes commented Aug 24, 2022

cudawarped commented Aug 26, 2022 •

edited

Loading

awarebayes commented Aug 26, 2022

cudawarped commented Aug 26, 2022

awarebayes commented Aug 27, 2022

awarebayes commented Aug 27, 2022

cudawarped Sep 13, 2022

awarebayes Sep 13, 2022

cudawarped Sep 13, 2022

awarebayes Sep 13, 2022

cudawarped commented Sep 13, 2022

cudawarped commented Nov 30, 2022

awarebayes commented Dec 6, 2022

awarebayes commented Dec 6, 2022

cudawarped commented Dec 6, 2022

asmorkalov Oct 22, 2024

disable npp in multistream context #3338

Are you sure you want to change the base?

disable npp in multistream context #3338

Uh oh!

Conversation

awarebayes commented Aug 24, 2022

Pull Request Readiness Checklist

cudawarped commented Aug 26, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

awarebayes commented Aug 26, 2022

cudawarped commented Aug 26, 2022

awarebayes commented Aug 27, 2022

awarebayes commented Aug 27, 2022

cudawarped Sep 13, 2022

Choose a reason for hiding this comment

awarebayes Sep 13, 2022

Choose a reason for hiding this comment

cudawarped Sep 13, 2022

Choose a reason for hiding this comment

awarebayes Sep 13, 2022

Choose a reason for hiding this comment

cudawarped commented Sep 13, 2022

cudawarped commented Nov 30, 2022

awarebayes commented Dec 6, 2022

awarebayes commented Dec 6, 2022

cudawarped commented Dec 6, 2022

asmorkalov Oct 22, 2024

Choose a reason for hiding this comment

cudawarped commented Aug 26, 2022 •

edited

Loading