Improve layer_norm speed#9355
Merged
panyx0718 merged 2 commits intoPaddlePaddle:developfrom Mar 26, 2018
Merged
Conversation
transfomer on a single device step time
reduces from 0.157 to 0.125
chengduoZH
reviewed
Mar 25, 2018
Contributor
chengduoZH
left a comment
There was a problem hiding this comment.
That is great!!
The functions in Eigen are too extensive and are very slow in many places.
|
|
||
| #ifdef PADDLE_WITH_CUDA | ||
| template <typename T> | ||
| class RowwiseMean2D<platform::CUDADeviceContext, T> { |
Contributor
There was a problem hiding this comment.
I think it might be better to write this function in math_function.
| template <typename T> | ||
| class ColwiseSum2D<platform::CUDADeviceContext, T> { | ||
| public: | ||
| ColwiseSum2D(int left, int right, const platform::DeviceContext& dev_ctx) |
Contributor
There was a problem hiding this comment.
ColwiseSum is used in lstm_op, gru_op, sequence_expand_op and lstmp_op, maybe those ops' performance can be improved too.
chengduoZH
approved these changes
Mar 26, 2018
Contributor
chengduoZH
left a comment
There was a problem hiding this comment.
This PR can be merged first and fixing the comments in next PR.
blacksheep-Aristotle
pushed a commit
to blacksheep-Aristotle/Paddle
that referenced
this pull request
Nov 22, 2024
* Fix exitcode bug * Fix `track_case_status` func match bug * Fix return code * Fix print_info func with exit -6 * set output format of fail tests modify verification check failed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
layer_norm forward and backward overall speed up 3x ~ 4x
transfomer on a single device step time
reduces from 0.157 to 0.125
the precommit also automatically formatted some codes.