Add soft-label support for cross-entropy operator.#4081
Add soft-label support for cross-entropy operator.#4081qingqing01 merged 6 commits intoPaddlePaddle:developfrom
Conversation
qingqing01
left a comment
There was a problem hiding this comment.
Need to update and fix conflicts.
paddle/operators/cross_entropy_op.cc
Outdated
| auto *X = ctx.Input<Tensor>("X"); | ||
| auto *label = ctx.Input<Tensor>("label"); | ||
| auto *x = ctx.Input<Tensor>("X"); | ||
| auto *label = ctx.Input<Tensor>("Label"); |
There was a problem hiding this comment.
Please add not null check for Input(X) and Input(Label). Thanks!
| namespace operators { | ||
|
|
||
| class OnehotCrossEntropyOp : public framework::OperatorWithKernel { | ||
| class CrossEntropyOp : public framework::OperatorWithKernel { |
There was a problem hiding this comment.
@luotao1 changes CrossEntropyOp -> OnehotCrossEntropyOp. Please use the new name.
There was a problem hiding this comment.
Since it supports not only one-hot cross-entropy but also soft-label cross-entropy, it would be better to use CrossEntropyOp instead of OnehotCrossEntropyOp.
paddle/operators/cross_entropy_op.cc
Outdated
| // normal cross entropy | ||
| PADDLE_ENFORCE_EQ(x->dims()[0], label->dims()[0]); | ||
| } | ||
| ctx.Output<Tensor>("Y")->Resize({x->dims()[0]}); |
There was a problem hiding this comment.
Output<framework::LoDTensor>Now must use Output<framework::LoDTensor> for output in forward and backward InferShape.
There was a problem hiding this comment.
@qingqing01 是否需要按照新的命名规范将输出 Y 改为 Out ?(之前益群在FC中使用Y作为输出时,也被要求改为 Out ,是否需要统一?)
There was a problem hiding this comment.
After discussing with @Xreki , we both prefer "Loss" as the output name rather than "Out". I think Loss is more meaningful than "Out".
paddle/operators/cross_entropy_op.cc
Outdated
| void InferShape(const framework::InferShapeContext &ctx) const override { | ||
| auto dX = ctx.Output<Tensor>(framework::GradVarName("X")); | ||
| auto X = ctx.Input<Tensor>("X"); | ||
| auto dx = ctx.Output<Tensor>(framework::GradVarName("X")); |
There was a problem hiding this comment.
Output< framework::LoDTensor>
| auto dX = ctx.Output<Tensor>(framework::GradVarName("X")); | ||
| auto X = ctx.Input<Tensor>("X"); | ||
| auto dx = ctx.Output<Tensor>(framework::GradVarName("X")); | ||
| auto x = ctx.Input<Tensor>("X"); |
There was a problem hiding this comment.
Also add not null check for Input(X). Thanks!
| Y[i] = -log(X[i][j]) | ||
| The second input (Label tensor) supports two kinds of shapes: | ||
| 1) Rank(Label) = 1, Label[i] indicates the class index for sample i: | ||
| Y[i] = -log(X[i, Label[i]]) |
There was a problem hiding this comment.
Add space before and after formula.
|
|
||
| 2) Rank(Label) = 2, Label[i, j] indicates the soft label of class j | ||
| for sample i: | ||
| Y[i] = \sum_j{-Label[i, j] * log(X[i, j])} |
There was a problem hiding this comment.
Add space before and after formula.
paddle/operators/cross_entropy_op.cu
Outdated
|
|
||
| template <typename T> | ||
| __host__ __device__ T clipping_log(const T x) { | ||
| __host__ __device__ T tolerable_value(const T x) { |
There was a problem hiding this comment.
include paddle/platform/hostdevice.h, then use HOSTDEVICE .
HOSTDEVICE T tolerable_value(const T x) {There was a problem hiding this comment.
I have a question here, if this function use __host__ __device__ in the declearation, why we need to implement it again in *.cc ?
There was a problem hiding this comment.
如果换成 HOSTDEVICE,依据HOSTDEVICE的定义:
#ifdef __CUDACC__
#define HOSTDEVICE __host__ __device__
#define HOST __host__
#else
#define HOSTDEVICE
#define HOST
#endif确实cpu、gpu可以公用这个tolerable_value。
There was a problem hiding this comment.
嗯~ 明白啦~ HOSTDEVICE 在 CPU 下为空。
| self.check_output() | ||
|
|
||
| def test_check_grad(self): | ||
| self.check_grad(['X'], 'Y', max_relative_error=0.05) |
There was a problem hiding this comment.
Could tune the max_relative_error smaller?
|
当label变成soft时,已经不再会有离散化的操作,是否可以直接调用 Egien? |
xinghai-sun
left a comment
There was a problem hiding this comment.
All done. Thanks.
paddle/operators/cross_entropy_op.cc
Outdated
| auto *X = ctx.Input<Tensor>("X"); | ||
| auto *label = ctx.Input<Tensor>("label"); | ||
| auto *x = ctx.Input<Tensor>("X"); | ||
| auto *label = ctx.Input<Tensor>("Label"); |
paddle/operators/cross_entropy_op.cc
Outdated
| // normal cross entropy | ||
| PADDLE_ENFORCE_EQ(x->dims()[0], label->dims()[0]); | ||
| } | ||
| ctx.Output<Tensor>("Y")->Resize({x->dims()[0]}); |
paddle/operators/cross_entropy_op.cc
Outdated
| void InferShape(const framework::InferShapeContext &ctx) const override { | ||
| auto dX = ctx.Output<Tensor>(framework::GradVarName("X")); | ||
| auto X = ctx.Input<Tensor>("X"); | ||
| auto dx = ctx.Output<Tensor>(framework::GradVarName("X")); |
| auto dX = ctx.Output<Tensor>(framework::GradVarName("X")); | ||
| auto X = ctx.Input<Tensor>("X"); | ||
| auto dx = ctx.Output<Tensor>(framework::GradVarName("X")); | ||
| auto x = ctx.Input<Tensor>("X"); |
| Y[i] = -log(X[i][j]) | ||
| The second input (Label tensor) supports two kinds of shapes: | ||
| 1) Rank(Label) = 1, Label[i] indicates the class index for sample i: | ||
| Y[i] = -log(X[i, Label[i]]) |
paddle/operators/cross_entropy_op.cu
Outdated
|
|
||
| template <typename T> | ||
| __host__ __device__ T clipping_log(const T x) { | ||
| __host__ __device__ T tolerable_value(const T x) { |
| self.check_output() | ||
|
|
||
| def test_check_grad(self): | ||
| self.check_grad(['X'], 'Y', max_relative_error=0.05) |
|
@lcy-seso 我觉得也是可以的,如果愿意容忍同一个if的两个分支:一个走cuda代码,一个走eigen。 |
|
caffe2 分为两个分支,可能更关键的是两者在计算上那个更高效。暂时未知。 |
| // TOOD(qingqing) define CUDA_1D_KERNEL_LOOP macro in a common file. | ||
| // CUDA_1D_KERNEL_LOOP(i, N) { | ||
| for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < N; | ||
| i += blockDim.x * gridDim.x) { |
There was a problem hiding this comment.
我的小疑问请教一下 @qingqing01
- kernel 函数里面的这个for循��,计算结果不会出错,但逻辑上感觉很怪。
- grid中总会有一些block的thread是多余出来,不会严格对齐到输入数据,这里for循环的作用相当于跳过这些没有对齐的部分。
i += blockDim.x * gridDim.x循环变量i只要增加一次,就会直接超过最大thread数,也就是这个kernel函数实际是并不需要循环多次,只计算输出向量的一个位置,逻辑上等价于判断i>= batch_size时,直接return,写成循环有什么考虑呢?
There was a problem hiding this comment.
如果针对下面设置grid, threadd的方式,确实不需要for循环。但如果将下面的grid的设置为一个固定的数,就是总共发起固定数目的总线程数,for循环就是有用的,有可能一个线程计算多个输出。这样这个kernel已经处理了边界,就不需要修改了。
int block = 512;
int grid = (n + block - 1) / block;There was a problem hiding this comment.
明白啦~ 确实,cross entropy 这个 kernel 比较简单也比较特殊。grid 数目也已经计算好。
paddle/operators/cross_entropy_op.cc
Outdated
| auto *label = ctx.Input<Tensor>("Label"); | ||
|
|
||
| PADDLE_ENFORCE_EQ(x->dims().size(), 2, "X's rank must be 2."); | ||
| PADDLE_ASSERT(label->dims().size() == 1 || label->dims().size() == 2); |
There was a problem hiding this comment.
As discussed in this morning, we should also use the label with rank 2 for the int label. Please help to modify it. And if so, there is no way to determine whether it is normal cross entropy or soft cross entropy by rank. Can we change to use attr?
paddle/operators/cross_entropy_op.cu
Outdated
| using Tensor = framework::Tensor; | ||
|
|
||
| template <typename T> | ||
| HOSTDEVICE T tolerable_value(const T x) { |
There was a problem hiding this comment.
As @lcy-seso said, both CPU and GPU kernel can use this common function, if we use HOSTDEVICE. Use this function to replace it in paddle/operators/cross_entropy_op.h file and delete this function in paddle/operators/cross_entropy_op.cc file.
| sum += label[i * D + j] * log(X[i * D + j]); | ||
| } | ||
| Y[i] = -tolerable_value(sum); | ||
| } |
There was a problem hiding this comment.
Put tolerable_value aftern log:
for (int j = 0; j < D; j++) {
sum += - label[i * D + j] * tolerable_value(log(X[i * D + j]));
}
paddle/operators/cross_entropy_op.cc
Outdated
| CrossEntropy Operator. | ||
|
|
||
| The second input (Label tensor) supports two kinds of shapes: | ||
| 1) Rank(Label) = 1, Label[i] indicates the class index for sample i: |
There was a problem hiding this comment.
If modify the int label rank, this doc comments also are needed to modify.
| for (int i = blockIdx.x * blockDim.x + threadIdx.x; i < N; | ||
| i += blockDim.x * gridDim.x) { | ||
| T sum = static_cast<T>(0); | ||
| for (int j = 0; j < D; j++) { |
There was a problem hiding this comment.
Please add todo optimization for this kernel.
paddle/operators/cross_entropy_op.h
Outdated
| T sum = static_cast<T>(0); | ||
| for (int j = 0; j < class_num; ++j) { | ||
| sum += label_data[index] * std::log(x_data[index]); | ||
| y_data[i] = -tolerable_value(sum); |
There was a problem hiding this comment.
Use tolerable_value before std::log.
sum += - label_data[index] * tolerable_value(std::log(x_data[index]));|
Merge this PR, if there is any question, we will fix later. |
Resolve #4080
Resolve #3898