Adding the implementation for rmsprop operator#4565
Adding the implementation for rmsprop operator#4565kavyasrinet merged 6 commits intoPaddlePaddle:developfrom
Conversation
dzhwinter
left a comment
There was a problem hiding this comment.
Good Job! there are some small pieces need to fix.
paddle/operators/rmsprop_op.cc
Outdated
| : OpProtoAndCheckerMaker(proto, op_checker) { | ||
| AddInput("Param", "Input parameter"); | ||
| AddInput("Grad", "Input gradient"); | ||
| AddInput("Moment", "Second moment"); |
There was a problem hiding this comment.
here is a typo. It is the momentum. And, the comment is not helpful, we should make the comment self-explained, in format of (type):comment. e.g. (tensor): blabla
There was a problem hiding this comment.
This is a good point, will fix this.
paddle/operators/rmsprop_op.cc
Outdated
| RMSprop | ||
|
|
||
| MomentOut = decayRate * Moment + (1 - decayRate) * Grad * Grad | ||
| ParamOut = Param - learningRate * Grad / (sqrt(MomentOut) + epsilon) |
There was a problem hiding this comment.
the Paddle old version, tensorflow, caffe2 had implemented rmsprop algorithm. They all follow the paper's formula parameter names, users used to use the same name between different version of our framework.
https://caffe2.ai/docs/operators-catalogue.html#rmsprop
tensorflow
paddle/operators/rmsprop_op.h
Outdated
| class RmspropOpKernel : public framework::OpKernel<T> { | ||
| public: | ||
| void Compute(const framework::ExecutionContext& ctx) const override { | ||
| auto param_out = ctx.Output<Tensor>("ParamOut"); |
There was a problem hiding this comment.
here please use auto* since param_out is a pointer. auto keyword always hidden the real type, pointer can be more clear for users who read the code.
There was a problem hiding this comment.
I see, I just assumed for now that auto will resolve this by itself. But I see the point of make it more understandable for users. Will fix.
| param = np.random.random((123, 321)).astype("float32") | ||
| grad = np.random.random((123, 321)).astype("float32") | ||
| moment = np.zeros((123, 321)).astype("float32") | ||
| learning_rate = np.array([0.01]).astype("float32") |
There was a problem hiding this comment.
why is 'learning_rate' not an attribute?
There was a problem hiding this comment.
As Abhinav commented above, I think I will retain this as an input for now, given that we are doing the same in all other PRs too.
| http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf) | ||
| does not have the epsilon attribute. It is added here for numerical stability | ||
| to avoid division by zero. | ||
|
|
There was a problem hiding this comment.
We'd better use the same common used parameter name since it is a popular optimizer. User used to the same name between frameworks.
https://github.com/tensorflow/tensorflow/blob/994226a4a992c4a0205bca9e2f394cb644775ad7/tensorflow/core/ops/training_ops.cc#L1281
https://caffe2.ai/docs/operators-catalogue.html#rmsprop
Adding the implementation for RMSprop:
MeanSquareOut = decay * MeanSquare + (1 - decay) * Grad * Grad
MomentOut = momentum * Moment + LearningRate * Grad / sqrt(MeanSquareOut + epsilon)
ParamOut = Param - MomentOut