Conversation
|
@guoshengCS Thanks for testing and document writing. |
| this->storeLocalValues(); | ||
| std::vector<std::string> buffers; | ||
| paddle::str::split(name, '.', &buffers); | ||
| auto it = this->values_.find(buffers[buffers.size() - 1]); |
There was a problem hiding this comment.
change buffers[buffers.size() - 1] to buffers.back()
|
|
||
| private: | ||
| void storeLocalValues() const { | ||
| CHECK_GT(numOutputSegments_, 0); |
There was a problem hiding this comment.
Change CHECK_GT to CHECK_GE, numOutputSegments_ can be 0 in practice. For example, the label sequence is O O O O.
| private: | ||
| void storeLocalValues() const { | ||
| CHECK_GT(numOutputSegments_, 0); | ||
| CHECK_GT(numLabelSegments_, 0); |
There was a problem hiding this comment.
Change CHECK_GT to CHECK_GE.
| void storeLocalValues() const { | ||
| CHECK_GT(numOutputSegments_, 0); | ||
| CHECK_GT(numLabelSegments_, 0); | ||
| double precision = (double)numCorrect_ / numOutputSegments_; |
There was a problem hiding this comment.
Change it to double precision = !numOutputSegments_ ? 0 : (double)numCorrect_ / numOutputSegments_;
| CHECK_GT(numOutputSegments_, 0); | ||
| CHECK_GT(numLabelSegments_, 0); | ||
| double precision = (double)numCorrect_ / numOutputSegments_; | ||
| double recall = (double)numCorrect_ / numLabelSegments_; |
There was a problem hiding this comment.
Change it to double recall = !numLabelSegments_ ? 0 : (double)numCorrect_ / numLabelSegments_;
| To make it clear, let's illustrate by a NER example. | ||
| Assuming that there are two named entity types including ORG and PER which are called 'chunk type' here, | ||
| if 'IOB' scheme were used, the label set will be extended to a set including B-ORG, I-ORG, B-PER, I-PER and O, | ||
| in which B-ORG for begining of ORG and I-ORG for end of ORG. |
There was a problem hiding this comment.
change end of ORG to inside of
| .. code-block:: python | ||
|
|
||
| 'plain' means the whole chunk must contain exactly the same chunk label. | ||
| Realizing that the number of is chunk type is 2 and number of tag type is 2, it is easy to validate this. |
There was a problem hiding this comment.
We should change the example to make chunk type and tag type of different number. In this case, both of them are 2 and may fail to help the users to clarify their misunderstanding.
|
|
||
| For each label in the label sequence, we have: | ||
| To use chunk evaluator, the construction of label dict should obey the following rules: | ||
| (1) Use one of the listed labelling schemes. These schemes differ in ways indicating chunk boundry. |
There was a problem hiding this comment.
I think we should define "chunk type", "tag type" before the following table. And we'd better have a running example to show how to label the words using different schemes. In fact, the following table is the protocol for assigning tag types, not the definition of the schemes. Therefore, I think we also need another table for the definitions.
| The total number of different labels is numTagType*numChunkTypes+1. | ||
| We support 4 labelling scheme. | ||
| The tag type for each of the scheme is shown as follows: | ||
| (2) Map can be done correctly by the listed equations. |
There was a problem hiding this comment.
Change Map to Mapping
Change can be done to is done. I think is done is better. Because if can be done was used, it may mislead the users to think this is only one of the feasible options. However, this is the only feasible one because we hard coded it.
| IOB 0 1 - - | ||
| IOE - 0 1 - | ||
| IOBES 0 1 2 3 | ||
| Continue the NER example, and the label dict should like this to satify above equations: |
pkuyym
left a comment
There was a problem hiding this comment.
1.Override getTypeImpl instead of getType.
2.I think holding precision, recall and F1-score into an unified map could make the code cleaner and easier to maintain and the extra computation cost is trivial.
3.Revise the document following the review comments.
pengli09
left a comment
There was a problem hiding this comment.
LGTM except getNames(). Please consult other members to make the final decision.
|
请参考 http://www.paddlepaddle.org/develop/doc_cn/howto/dev/write_docs_cn.html 看一下生成出的文档格式是否正确。 |
| IOB Two labels for chunk type X, B-X for chunk begining and I-X for chunk inside. | ||
| IOE Two labels for chunk type X, E-X for chunk ending and I-X for chunk inside. | ||
| IOBES Four labels for chunk type X, B-X for chunk begining, I-X for chunk inside, E-X for chunk end and S-X for single word chunk. | ||
| .. code-block:: python |
There was a problem hiding this comment.
366行可以去掉,只需要一个.. code-block::python即可。下同。
请生成下文档看下显示是否正确。目前粗看,会有一些问题。
| The construction of label dict should obey the following rules: | ||
| (1) Use one of the listed labelling schemes. These schemes differ in ways indicating chunk boundry. | ||
|
|
||
| .. code-block:: python |
There was a problem hiding this comment.
Why code-block is python? It seems a plain text?
..[SPACE]code-block:: [language]
[EMPTY_LINE]
[SPACE][SPACE][SPACE]Your texts.
.. code-block:: text
abc
def
* fix update time * add rerun * fix permission error * fix delete container * fix mlu env * fix mlu ci error * fix cleanup * fix cleanup
fixes #2078