Conversation
| ctx->SetOutputDim("F1-Score", {1}); | ||
| } | ||
|
|
||
| framework::DataType IndicateDataType( |
There was a problem hiding this comment.
IndicateDataType is a protected member function.
protected:
framework::DataType IndicateDataType(...) {
}| tag_single = -1; | ||
| } else { | ||
| PADDLE_THROW("Unknown chunk scheme."); | ||
| } |
There was a problem hiding this comment.
Do we need to define a struct for these arguments and put these arguments initialization code to another member function?
paddle/operators/chunk_eval_op.cc
Outdated
| Chunk evaluator is used to evaluate segment labelling accuracy for a | ||
| sequence. It calculates precision, recall and F1 scores for the chunk detection. | ||
| To use chunk evaluator, several concepts need to be clarified firstly. | ||
| [Chunk type] is the type of the whole chunk and a chunk consists of one or several words. (For example in NER, ORG for organization name, PER for person name etc.) |
There was a problem hiding this comment.
Add an empty line before line 81 and 82.
Give the full name for the NER.
paddle/operators/chunk_eval_op.cc
Outdated
| IOBES Four labels for chunk type X, B-X for chunk begining, I-X for chunk inside, E-X for chunk end and S-X for single word chunk. | ||
|
|
||
| To make it clear, let's illustrate by an NER example. | ||
| Assuming that there are three named entity types including ORG, PER and LOC which are called 'chunk type' here, |
paddle/operators/chunk_eval_op.cc
Outdated
|
|
||
| tagType = label % numTagType | ||
| chunkType = label / numTagType | ||
| otherChunkType = numChunkTypes |
There was a problem hiding this comment.
The numTagType and numChunkTypes here is clear, but better to explain them again.
paddle/operators/chunk_eval_op.h
Outdated
| tag_end, tag_single, excluded_chunk_types); | ||
| } | ||
| *precision_data = | ||
| !num_output_segments ? 0 : (T)num_correct / num_output_segments; |
There was a problem hiding this comment.
(T) num_correct -> static_cast<T>
paddle/operators/chunk_eval_op.cc
Outdated
| Chunk evaluator is used to evaluate segment labelling accuracy for a | ||
| sequence. It calculates precision, recall and F1 scores for the chunk detection. | ||
| To use chunk evaluator, several concepts need to be clarified firstly. | ||
| [Chunk type] is the type of the whole chunk and a chunk consists of one or several words. (For example in NER, ORG for organization name, PER for person name etc.) |
There was a problem hiding this comment.
It's necessary that we explain meaning of 'chunk' before 'chunk type'
[chunk] is a subset of the tokens in a sentence. a yellow dog is a chunk of sentence I have a yellow dog.. And chunk of sentence can be noun phrase, person name, organization name and so on.
paddle/operators/chunk_eval_op.cc
Outdated
| Chunk evaluator is used to evaluate segment labelling accuracy for a | ||
| sequence. It calculates precision, recall and F1 scores for the chunk detection. | ||
| To use chunk evaluator, several concepts need to be clarified firstly. | ||
| [Chunk type] is the type of the whole chunk and a chunk consists of one or several words. (For example in NER, ORG for organization name, PER for person name etc.) |
There was a problem hiding this comment.
the whole chunk -> a chunk?
paddle/operators/chunk_eval_op.cc
Outdated
| sequence. It calculates precision, recall and F1 scores for the chunk detection. | ||
| To use chunk evaluator, several concepts need to be clarified firstly. | ||
| [Chunk type] is the type of the whole chunk and a chunk consists of one or several words. (For example in NER, ORG for organization name, PER for person name etc.) | ||
| [Tag type] indicates the position of a word in a chunk. (B for begin, I for inside, E for end, S for single) |
paddle/operators/chunk_eval_op.cc
Outdated
| "IOB" so tagType has two values: 0 for B and 1 for I. | ||
| Here we will use I-LOC to explain the above mapping rules in detail. | ||
| For I-LOC, the label id is 5, so we can get tagType=1 and chunkType=2, which means I-LOC is a part of NER chunk LOC | ||
| and the tag is I. |
There was a problem hiding this comment.
How about giving an example here?
Steven B-PER 2
Paul I-PER 3
Jobs I-PER 3
works O 6
for O 6
Baidu B-ORG 0
Inc. I-ORG 1
at O 6
Beijing B-LOC 4
of I-LOC 5
China I-LOC 5
|
|
||
| void EvalOneSeq(const int* output, const int* label, int length, | ||
| std::vector<Segment>& output_segments, | ||
| std::vector<Segment>& label_segments, |
There was a problem hiding this comment.
output_segments and label_segments are not used outside of EvalOneSeq. So why not difine them in EvalOneSeq and remove them from arguments list?
lcy-seso
left a comment
There was a problem hiding this comment.
The codes in this PR LGTM (from the original chunk evaluator). But the documentation needs to refine. I think we can merge the codes and ask someone who is familiar with sequence tagging task and good at English writing for help to refine the doc.
paddle/operators/chunk_eval_op.cc
Outdated
| AddInput("Label", "(Tensor, default: Tensor<int>) Labels of the data."); | ||
| AddOutput( | ||
| "Precision", | ||
| "(float) The precision ratio of the predictions on current data."); |
There was a problem hiding this comment.
The evaluated precision (called positive predictive value) of chunks on the given mini-batch.
paddle/operators/chunk_eval_op.cc
Outdated
| "Precision", | ||
| "(float) The precision ratio of the predictions on current data."); | ||
| AddOutput("Recall", | ||
| "(float) The recall ratio of the predictions on current data."); |
There was a problem hiding this comment.
The evaluated recall (true positive rate or sensitivity) of chunks on the given mini-batch.
I think we should tell the users such an evaluation is performed on the mini-batch, not on the data tested up to now. But, once we change this, and make sure to update the doc.
paddle/operators/chunk_eval_op.cc
Outdated
| AddOutput("Recall", | ||
| "(float) The recall ratio of the predictions on current data."); | ||
| AddOutput("F1-Score", | ||
| "(float) The F1-Score of the predictions on current data."); |
There was a problem hiding this comment.
The evaluated F1-Score on the given mini-batch.
paddle/operators/chunk_eval_op.cc
Outdated
| framework::OpAttrChecker *op_checker) | ||
| : OpProtoAndCheckerMaker(proto, op_checker) { | ||
| AddInput("Inference", | ||
| "(Tensor, default: Tensor<int>) Predictions from the network."); |
There was a problem hiding this comment.
Add a "." after (Tensor, default: Tensor). The same below.
(Tensor, default: Tensor<int>). Predictions from the network.
paddle/operators/chunk_eval_op.cc
Outdated
| : OpProtoAndCheckerMaker(proto, op_checker) { | ||
| AddInput("Inference", | ||
| "(Tensor, default: Tensor<int>) Predictions from the network."); | ||
| AddInput("Label", "(Tensor, default: Tensor<int>) Labels of the data."); |
paddle/operators/chunk_eval_op.cc
Outdated
| "(float) The F1-Score of the predictions on current data."); | ||
| AddAttr<int>("num_chunk_types", "(int) The number of chunk type."); | ||
| AddAttr<std::string>("chunk_scheme", | ||
| "(string, default IOB) The label scheme.") |
There was a problem hiding this comment.
The labeling scheme indicating how to encode the chunks, including IOB, x, x, x, (all the supported schemes.) It is better to add a reference here to explain how these schemes label chunks.
| "excluded_chunk_types", | ||
| "(list<int>) A list<int> indicating chunk types not to be counted.") | ||
| .SetDefault(std::vector<int>{}); | ||
| AddComment(R"DOC( |
There was a problem hiding this comment.
Here, I think it will much better to explain what is chunk first. For example, maybe like this.
Chunks are about character spans. In the sequence tagging problem, chunks are sequences of tokens (words or other units) and tags (tag labels, categories).
paddle/operators/chunk_eval_op.cc
Outdated
| .SetDefault(std::vector<int>{}); | ||
| AddComment(R"DOC( | ||
| Chunk evaluator is used to evaluate segment labelling accuracy for a | ||
| sequence. It calculates precision, recall and F1 scores for the chunk detection. |
There was a problem hiding this comment.
the chunk detection --> chunks the model predicts.
paddle/operators/chunk_eval_op.cc
Outdated
| AddComment(R"DOC( | ||
| Chunk evaluator is used to evaluate segment labelling accuracy for a | ||
| sequence. It calculates precision, recall and F1 scores for the chunk detection. | ||
| To use chunk evaluator, several concepts need to be clarified firstly. |
There was a problem hiding this comment.
we first introduce some related concepts.
paddle/operators/chunk_eval_op.cc
Outdated
| .SetDefault("IOB"); | ||
| AddAttr<std::vector<int>>( | ||
| "excluded_chunk_types", | ||
| "(list<int>) A list<int> indicating chunk types not to be counted.") |
There was a problem hiding this comment.
- indicating chunk types that are not counted.
- This explanation is hard to understand for users.
There was a problem hiding this comment.
Done. Add see below for details.
resolves #4749