Implementation of MKLDNN LRN#9123
Implementation of MKLDNN LRN#9123luotao1 merged 4 commits intoPaddlePaddle:developfrom tpatejko:tpatejko/mkldnn-lrn
Conversation
There was a problem hiding this comment.
There are two algorithms ACROSS_CHANNELS and WITHIN_CHANNEL, but TestLRNMKLDNNOp here only test the default one.
There was a problem hiding this comment.
Thanks for this remark, @luotao1. PaddlePaddle seems to implement LRN that works across channels. For MKLDNN, WITHIN_CHANNEL option does not work in backward pass.
I decided to remove "algorithm" attribute and fix LRN algorithm to ACROSS_CHANNELS.
paddle/fluid/operators/lrn_op.cc
Outdated
There was a problem hiding this comment.
The GetExpectedKernelType of LRNOp and LRNOp are the same, could these functions be fused?
There was a problem hiding this comment.
I moved content of this method to standalone function in an anonymous namespace, and reuse it in both LRNOp and LRNGradOp.
|
LGTM. @tensor-tang Could you help review lrn_mkldnn_op.cc? |
|
|
||
| auto input_data = x->data<T>(); | ||
| auto output_data = out->mutable_data<T>(ctx.GetPlace()); | ||
| mid->mutable_data<T>(ctx.GetPlace()); |
There was a problem hiding this comment.
What is this MidOut for in MKL-DNN implementations?
There was a problem hiding this comment.
According to documentation of LRN operator, it's a middle result of lrn operator that is computed in forward pass and used in backward pass. In practice, CPU naive implementation sets it to a value of attribute k, and MidOut tensor is validated by LRN unit tests.
It's not really needed by MKLDNN primitives. However, it's a part of LRN interface, as an input tensor, so I decided fill it in the same way as it's done in CPU naive implementation.
There was a problem hiding this comment.
If it's not needed by MKLDNN primitives, you can use the following ways: "MaxIndex" is only used when "pooltype" == "MAX" in sequence_pool_op
Paddle/paddle/fluid/operators/sequence_pool_op.cc
Lines 30 to 34 in 0821ee7
| const float k = ctx.Attr<float>("k"); | ||
|
|
||
| auto e_mid = framework::EigenTensor<T, 4>::From(*mid); | ||
| e_mid = e_mid.constant(k); |
There was a problem hiding this comment.
What is e_mid? and why we need it?
There was a problem hiding this comment.
It's an Eigen TensorMap that it's mapped onto data of the MidOut tensor. It's used to fill MidOut with value of attribute k.
| auto workspace_md = forward_pd->workspace_primitive_desc(); | ||
| auto workspace_memory = std::make_shared<mkldnn::memory>(workspace_md); | ||
|
|
||
| dev_ctx.SetBlob(key_workspace_memory, workspace_memory); |
There was a problem hiding this comment.
This workspace_memory would always be allocated in forward, which is not performance best.
There was a problem hiding this comment.
@tensor-tang , you are right, that is allocated in forward pass and used in backward pass. What is your suggestion? Would you like to have it allocated only once when workspace memory is not present in the device context?
There was a problem hiding this comment.
I think you could try to get it first, then create only if it's empty.
| dev_ctx.SetBlob(key_workspace_memory, workspace_memory); | ||
|
|
||
| auto forward_op = mkldnn::lrn_forward{*forward_pd, *src_memory, | ||
| *workspace_memory, dst_memory}; |
There was a problem hiding this comment.
BTW, this memory and function is used only in training phase, maybe you should add one TODO to optimize it when you can get phase, as a reminder.
There was a problem hiding this comment.
@tensor-tang, good point! Do you know how to discover how to check whether we are in training phase or scoring phase?
There was a problem hiding this comment.
You can following the same way of batch_norm_op, which uses is_test:
Paddle/paddle/fluid/operators/batch_norm_op.cc
Lines 175 to 182 in 0821ee7
There was a problem hiding this comment.
@luotao1 @tensor-tang, I do see your point but in case of LRN is_test would become a part of user interface not only for MKLDNN LRN, but also for other implementations of LRN. Do you think adding such attribute would be justified for other implementations of LRN operator?
You could make such case for other operators as well. We could end up in the situation when each operator sets its own local is_test attribute to maintain information about phase, when this information is in fact global (global as being maintain on the network level).
Would it make more sense to maintain phase information in network executor?
I could add TODO comment saying that the blob should not be allocated when PaddlePaddle is in testing phase.
There was a problem hiding this comment.
Do you think adding such attribute would be justified for other implementations of LRN operator
Yes, "is_test" is also suited for other implementation of LRN operators.
You could make such case for other operators as well. We could end up in the situation when each operator sets its own local is_test attribute to maintain information about phase, when this information is in fact global (global as being maintain on the network level). Would it make more sense to maintain phase information in network executor?
Now, framework/prune.cc use this attribute in inference phase.
Paddle/paddle/fluid/framework/prune.cc
Lines 187 to 197 in 0821ee7
I could add TODO comment saying that the blob should not be allocated when PaddlePaddle is in testing phase.
Yes, you can either add TODO comment or fix in this PR.
There was a problem hiding this comment.
@luotao1 thanks for your reply. I will add the attribute to the operator.
But would it be more useful to maintain testing/training information on the network level, instead of keeping on the operator level?
There was a problem hiding this comment.
Now We maintain testing/training information on network level as well.
Paddle/python/paddle/fluid/framework.py
Lines 959 to 982 in 0821ee7
But if operator level doesn't have is_test information, how could this op behave differently in test or train phase?
|
I just added some changes regarding is_test attribute and unittest for is_test attribute. Could it be also merged? |
This PR contains implementation of LRN optimized with MKLDNN. It also contains unittests for the operator.