Conversation
e476d5d to
021d595
Compare
021d595 to
b67821a
Compare
paddle/fluid/framework/op_registry.h
Outdated
There was a problem hiding this comment.
Why do you REGISTER_OP_KERNEL_WITH_LAYOUT? I could not find the usage of it.
There was a problem hiding this comment.
According to your previous proposition about splitting the pull-request into a few small parts of the code, this one contains only the changes required to support MKLDNN operators, but it does not include example how this #define is used. Could you have a look at this code (#usage)
paddle/fluid/framework/op_registry.h
Outdated
There was a problem hiding this comment.
Why do you USE_OP_DEVICE_KERNEL_EXTEND ? I could not find the usage of it.
There was a problem hiding this comment.
What do you mean by "EXTEND"? Is that specifically for MKLDNN?
Or can you use some existed macro instead?
Do you have any comment about adding #define USE_OP_DEVICE_KERNEL_EXTEND? @jacquesqiao
There was a problem hiding this comment.
Yes, that is a specific way for the MKLDNN's. This is the way to tell the MKLDNN's from the CPU's platform. These functions that we see are a little bit different, therefore I can't use any existing functions. It gives us an ability to register the new MKLDNN's operator which has supports of the layout.
There was a problem hiding this comment.
After discuss with @tensor-tang , we think that currently all mkldnn kernels have only one layout MKLDNN, so we can use LIBRARY to mark and distinguish them, mkldnn_kernels can be choosed by this library flag, data transform can use the tensor.layout_ to decide if it need to do transform.
It seems that currently all our work can be done without register layout to op_kernel_registry, so can we just use LIBRARY for now and move forward on our work?
paddle/fluid/framework/tensor.h
Outdated
There was a problem hiding this comment.
I think it's not a good way let Tensor : public MKLDNNTensor
There was a problem hiding this comment.
May be
class Tensor {
#ifdef PADDLE_WITH_MKLDNN
mkl_fileds;
void mkldnn_method();
#endif
};
is a better way
There was a problem hiding this comment.
@jacquesqiao and @tensor-tang, The inheritance was removed from the code, and the all methods was moved from mkldnn_tensor to tensor file
There was a problem hiding this comment.
why do you modify the parameter order of strides and paddings? Could pool_mkldnn_op.cc remain as before?
There was a problem hiding this comment.
I modified these line of code, because the cpplint gives the piece of information that the code must be modified. I needn't have to did these changes. Likely that is relate to the compiler version which is used by me.
There was a problem hiding this comment.
How about get_data_from_tensor be GetDataFromTensor? Since we use Camel Case in naming.
There was a problem hiding this comment.
How about TransDataLayoutMkldnn be TransDataLayoutMKLDNN? Since @wangkuiyi suggest in #3337 (comment)
There was a problem hiding this comment.
How about to_mkldnn_data_type be ToMKLDNNDataType? Since we use Camel Case in naming.
There was a problem hiding this comment.
It seems the function name is not appropriate according to this comment.
There was a problem hiding this comment.
Done. I changed this name of the function: TransDataLayoutFromMKLDNN
There was a problem hiding this comment.
Is there any better idea do not use #ifdef PADDLE_WITH_MKLDNN since Paddle itself supports kMKLDNN.
As you can see
enum class LibraryType {
kPlain = 0,
kMKLDNN = 1,
kCUDNN = 2,
};
There was a problem hiding this comment.
That's a temporary solution, why I had to use this #ifdef flag - I'm waiting for mkldnn flag (i.e a global flag). The team of PaddlePaddle working on this problem to make this flag as a global flag - for all mkldnn operators. Please have a look at this issue, where was touched this topic.
There was a problem hiding this comment.
Thanks @mozga-intel , I can understand that's a temporary solution. But I am afraid that #10765 is not trying to deal with this one.
Maybe you can directly remove #if here since cudnn do not have global flag as well.
There was a problem hiding this comment.
Any better solution?
It seems have some duplicated logic with
if (NeedTransformLayout(lout, lin)) {
#ifdef PADDLE_WITH_MKLDNN
if (lin == DataLayout::kMKLDNN || lout == DataLayout::kMKLDNN) {
...
paddle/fluid/framework/op_registry.h
Outdated
There was a problem hiding this comment.
What do you mean by "EXTEND"? Is that specifically for MKLDNN?
Or can you use some existed macro instead?
Do you have any comment about adding #define USE_OP_DEVICE_KERNEL_EXTEND? @jacquesqiao
f82e9d1 to
ae23bb6
Compare
|
please merge the latest code to pass the |
ae23bb6 to
47d6bab
Compare
Add MKLDNN layout in Paddle so that MKLDNN friendly memory layout can be used in MKLDNN enabled OP kernel. Before this commit, NCHW is hardcode to be used in all MKLDNN op kernels. As a result, non-optimized execution path is selected in MKLDNN primitive which bring worse performance. Besides framework change, three MKLDNN OP kernels were updated for using new MKLDNN layout. They are conv/pool2d/batch_norm. Other MKLDNN OP kernels need be also updated in similar way to achieve best performance.
47d6bab to
568aff0
Compare
|
@luotao1 Done. |
There was a problem hiding this comment.
Thanks for your great work @mozga-intel , but a little question @luotao1 please help check is that acceptable.
| // Fix me: here just change the default layout to kNCHW | ||
| // it doesn't fix the real issue, i.e. feeder should set up tensor layout | ||
| // according to actual input data | ||
| DataLayout layout_ = DataLayout::kNCHW; |
There was a problem hiding this comment.
@mozga-intel @tensor-tang
Discussed with @qingqing01, it's acceptable to modify the default layout.
|
@luotao1 @tensor-tang @jacquesqiao Thanks very much, great work. |
This PR contains changes required to support MKLDNN memory layouts improving performance of MKLDNN computations.
This implementation differs from the proposed one in #10291. It is based mainly on Brian Liu implementation, which is simplified version of what I proposed previously.
The latest version of the pull request is splits into a few part of code (information).
This pull request contains:
Pull-request is related to: