Skip to content

disable npp in multistream context #3338

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: 4.x
Choose a base branch
from

Conversation

awarebayes
Copy link

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • [* ] I agree to contribute to the project under Apache 2 License.
  • [* ] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • [* ] The PR is proposed to the proper branch
  • [* ] There is a reference to the original bug report and related work
  • [*] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • [*] The feature is well documented and sample code can be built with the project CMake

This PR is to match #3330

The issue is simple to reproduce a code from this popular tutorial can be used, just replace cuda::resize with cuda::warpAffine and some random matrix to accompany.

To my best knowledge is does not make sense to use npp with multistream context and simple warp implementation works just fine.

You can see using nppSetStream uses a lot of ioctl as was referenced in the issue and thus this function should only be used with the default context.

Alternatively there can be a static compilation warning in the NppWarp class, parametrized for calling it with StreamNull using static dispatch. I can probably try pulling that off, it may result in messier code, but the warning will be compile time.

@cudawarped
Copy link
Contributor

cudawarped commented Aug 26, 2022

@awarebayes Whilst I am no fan of NPP, I believe that they should offer the best performance going forward on newer architectures than the OpenCV hand rolled code which was optimized for hardware from 10 years ago. Now if all the OpenCV CUDA routines could be updated for new hardware and CUDA SDK features when they are released then I would change my mind but OpenCV is open source and the CUDA modules are in the contrib repository so this is not realistic.

With that out of the way would it not be better to replace the existing global stream versions (e.g. nppiWarpAffine_8u_C1R )
with the newer ones which accept and NppStreamContext (e.g. nppiWarpAffine_8u_C1R_ctx )? I ran a quick test on a 2000x2000 matrix using these changes and the performance of the npp and OpenCV hand rolled version are about the same. I think these changes and a #define for the CUDA version will be sufficient. Then if that works all of the npp routines could be updated to make propper use of streams.

A futher improvement might be to have an additional flag which lets the user decide if they want to use the NPP routines or not.

@awarebayes
Copy link
Author

@cudawarped Should I be the one to do it? I.E. replace with the contexted version

@cudawarped
Copy link
Contributor

@awarebayes Its your pull request, I'm mearly suggesting an alternative approach which in my opinion may be better, ultimately its up to the maintainers to decide. If you do apply the suggested changes make sure to include the #define to only update for CUDA toolkit >= 10.1 see the release notes .

First I would check that the changes I suggested are as quick as your change for your specific problem.

@awarebayes
Copy link
Author

I did a compile time static dispatch for either _Ctx or non-_Ctx methods.

Tested, it performs not as good as opencv's because it uses cv::cuda::Mat::setTo to fill image with border color.

As for the details of the implemenation:
I basically check ,whether the library version supports _Ctx functions in a #define canUseContext
If it can, I concatenate _Ctx to the given function names in the function table. It it can't, I concatenate nothing to the names.
Then I need to issue a specific call to two different versions. I use static dispatch and Alexandrescu's Type2Var idiom with class templates to select the required version to call during the compile time.

So everything is static and should not affect the performance

@awarebayes
Copy link
Author

You may not like me using concatenation macros, it can be avoided

@@ -193,7 +260,8 @@ void cv::cuda::warpAffine(InputArray _src, OutputArray _dst, InputArray _M, Size
CV_Assert( interpolation == INTER_NEAREST || interpolation == INTER_LINEAR || interpolation == INTER_CUBIC );
CV_Assert( borderMode == BORDER_REFLECT101 || borderMode == BORDER_REPLICATE || borderMode == BORDER_CONSTANT || borderMode == BORDER_REFLECT || borderMode == BORDER_WRAP );

_dst.create(dsize, src.type());
if (_dst.size() != dsize)
_dst.create(dsize, src.type());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is redundant as void cv::cuda::GpuMat::create(int _rows, int _cols, int _type) should automatically handle this for you in addition to checking the type.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

far as I can remember, I checked the profiler and maybe there was some overhead? I am not sure...

call_impl(src, dst, coeffs, interpolation, stream, Int2Type<canUseContext>());
}

template <int I>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use cudev::Int2Type?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure thing

@cudawarped
Copy link
Contributor

@alalek I'm not great with templates is there a better way of implementing this change with them? I'm asking because if this change is suitable it could probably be applied to all the CUDA NPP routines.

Just out of interest how long does support need to be offered to older CUDA releases? NppStreamContext (_ctx) was added 3 years ago so if support only has to be for 3 years then the routines could be updated to just use the newer NppStreamContext versions instead of both?

@cudawarped
Copy link
Contributor

@awarebayes where did you get to with this?

I finally had chance to take a look and I'm not sure if it makes sense to use templates in this case. I may be wrong but the code we are trying to template is not another type but historic/unsafe functions, which should probably be removed in future, and are only included for users who are unable to upgrade to the lates version of CUDA.

I think something similar to

#if USE_NPP_STREAM_CONTEXT
would suffice, what do you think?

@awarebayes
Copy link
Author

I am pro removing old code and just going along with the new version.

Ifdefs can solve anything but those are just macros and do not provide static dispatch

@awarebayes
Copy link
Author

There is an obvious problem for most people, using new versions of NPP, if it breaks older code, welp :(

@cudawarped
Copy link
Contributor

Ifdefs can solve anything but those are just macros and do not provide static dispatch

Can you explain futher what you mean? Both the templated and the #define version have one code path determined at compile time based on the version of NPP being used. There is currently no flag to choose the older unsafe version.

There is an obvious problem for most people, using new versions of NPP, if it breaks older code, welp :(

This could be an issue but then isn't that as an issue a user may/can/does unfortunately face every time they change the version of the CUDA SDK they are using?

@@ -45,8 +45,19 @@
using namespace cv;
using namespace cv::cuda;

#define canUseContext NPP_VERSION >= (10 * 1000 + 1 * 100 + 0)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please user uppercase for macros, e.g. NPP_CONTEXT_AVAILABLE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
3 participants