Scheduled sampling by wwhu · Pull Request #29 · PaddlePaddle/models

wwhu · 2017-05-08T07:13:55Z

resolve #11
Note: This model may encounter "Floating point exception" after training on several mini-batches.
Scheduled sampling need to use the api multiplex_layer (PaddlePaddle/Paddle#1753), which has not been implemented in current Paddle version. I implemented this layer in my repositories (https://github.com/wwhu/Paddle/blob/ss-dev/python/paddle/trainer_config_helpers/layers.py). I will post a PR to the official Paddle repository after I write the unit test for it.

luotao1 · 2017-05-09T03:36:04Z

scheduled_sampling/random_schedule_generator.py

+    schduled_type: is the type of the decay. It supports constant, linear,
+    exponential, and inverse_sigmoid right now.
+    a: parameter of the decay (MUST BE DOUBLE)
+    b: parameter of the decay (MUST BE DOUBLE)


12-15行的注释放到18行init函数下面，因为这三个参数在init的时候才出现。

luotao1 · 2017-05-09T03:36:37Z

scheduled_sampling/random_schedule_generator.py

+    Get the schedule sampling rate. Usually not needed to be called by the users
+    '''
+
+    def getScheduleRate(self):


33行注释放到36行下面。下同。因为后续如果别人在中间插了个函数，就不知道这段注释对应的是哪个函数了。

lcy-seso

Some small modifications.

lcy-seso · 2017-05-09T10:52:42Z

scheduled_sampling/random_schedule_generator.py

+if __name__ == "__main__":
+    schedule_generator = RandomScheduleGenerator("linear", 0.1, 500000)
+    true_token_flag = schedule_generator.processBatch(5)
+    pdb.set_trace()


please delete debug related codes.

lcy-seso · 2017-05-09T10:53:05Z

scheduled_sampling/random_schedule_generator.py

@@ -0,0 +1,56 @@
+import numpy as np
+import math
+import pdb


please remove the debug module.

lcy-seso · 2017-05-09T10:59:42Z

scheduled_sampling/scheduled_sampling.py

+            decoder_state=decoder_mem)
+
+        gru_out_memory = paddle.layer.memory(
+            name='gru_out', size=target_dict_dim)  # , boot_with_const_id=0)


please remove useless comment.

wwhu · 2017-05-10T07:34:50Z

I have revised the code. Please review it. Thanks.
@lcy-seso @luotao1

… ss-dev update

lcy-seso

Scheduled sampling should not be used in generation. Multiplex layer should only be created in training.

lcy-seso · 2017-05-11T06:41:27Z

scheduled_sampling/scheduled_sampling.py

+    src_embedding = paddle.layer.embedding(
+        input=src_word_id,
+        size=word_vector_dim,
+        param_attr=paddle.attr.ParamAttr(name='_source_language_embedding'))


Because the parameter name _source_language_embedding is not explicitly used, it can be removed to avoid such a hard code.

lcy-seso · 2017-05-11T06:51:47Z

scheduled_sampling/scheduled_sampling.py

+    return data_reader
+
+
+def seqToseq_net(source_dict_dim, target_dict_dim, is_generating=False):


comment on parameters like in random_schedule_generator.py.

lcy-seso · 2017-05-11T06:51:57Z

scheduled_sampling/scheduled_sampling.py

+            input=backward_first)
+
+    def gru_decoder_with_attention_train(enc_vec, enc_proj, true_word,
+                                         true_token_flag):


comment on parameters like in random_schedule_generator.py.

lcy-seso · 2017-05-11T06:52:17Z

scheduled_sampling/scheduled_sampling.py

+
+        return out
+
+    def gru_decoder_with_attention_test(enc_vec, enc_proj, current_word):


comment on parameters like in random_schedule_generator.py.

lcy-seso · 2017-05-11T06:56:14Z

scheduled_sampling/scheduled_sampling.py

+            param_attr=paddle.attr.ParamAttr(name='_target_language_embedding'))
+
+        current_word = paddle.layer.multiplex(
+            input=[true_token_flag, true_word, generated_word_emb])


This layer should not be created in generating, because, in the generation, generated word is always used.

The multiplex layer is in the function gru_decoder_with_attention_train, which is only called during training.

Sorry, I see.

lcy-seso · 2017-05-11T06:57:58Z

scheduled_sampling/scheduled_sampling.py

+            size=target_dict_dim,
+            embedding_name='_target_language_embedding',
+            embedding_size=word_vector_dim)
+        group_inputs.append(trg_embedding)


In the generation, target embedding is unknown. This configuration is not reasonable.

The type of trg_embedding is GeneratedInputV2. It shares the embedding matrix of the target language with the embedding matrix during training. It doesn't use ground-truth target words as inputs.

wwhu · 2017-05-15T07:57:25Z

I have revised the comments and added the document in README.md.
The experimental results are not included in the document since I haven't tuned the hyper parameters of the model .
The models runs slowly on my CPU machine. So it will take some time to validate the performance of scheduled sampling.
@lcy-seso

lcy-seso

need modifications.

lcy-seso · 2017-05-15T09:13:39Z

scheduled_sampling/scheduled_sampling.py

+            param_attr=paddle.attr.ParamAttr(name='_target_language_embedding'))
+
+        current_word = paddle.layer.multiplex(
+            input=[true_token_flag, true_word, generated_word_emb])


Sorry, I see.

lcy-seso · 2017-05-15T09:14:55Z

scheduled_sampling/scheduled_sampling.py

+        """
+        The decoder step for training.
+        :param enc_vec: the encoder vector for attention
+        :type enc_vec: Layer


Layer --> LayerOutput

lcy-seso · 2017-05-15T09:15:02Z

scheduled_sampling/scheduled_sampling.py

+        :param enc_vec: the encoder vector for attention
+        :type enc_vec: Layer
+        :param enc_proj: the encoder projection for attention
+        :type enc_proj: Layer


Layer --> LayerOutput

lcy-seso · 2017-05-15T09:16:01Z

scheduled_sampling/scheduled_sampling.py

+        :param enc_proj: the encoder projection for attention
+        :type enc_proj: Layer
+        :param true_word: the ground-truth target word
+        :type true_word: Layer


Layer --> LayerOutput

lcy-seso · 2017-05-15T09:16:10Z

scheduled_sampling/scheduled_sampling.py

+        :param true_token_flag: the flag of using the ground-truth target word
+        :type true_token_flag: Layer
+        :return: the softmax output layer
+        :rtype: Layer


Layer --> LayerOutput

lcy-seso · 2017-05-15T10:14:02Z

scheduled_sampling/README.md

+ - 反向Sigmoid衰减：`epsilon_i=k/(k+exp(i/k))`，其中`k>1`，`k`同样控制衰减的幅度。
+
+## 模型实现
+由于Scheduled Sampling是对Sequence to Sequence模型的改进，其整体实现框架与Sequence to Sequence模型较为相似。为突出本文重点，这里仅介绍与Scheduled Sampling相关的部分，完整的代码见`scheduled_sampling.py`。


与scheduled sampling相关的，包括：

采样概率如何衰减

multiplex layer如何使用

都需要解释，这几组产生采样概率的函数，超参数设置原则？

lcy-seso · 2017-05-15T10:17:04Z

scheduled_sampling/README.md

+这里对数据reader进行封装，加入从`RandomScheduleGenerator`采样得到的`true_token_flag`作为另一组数据输入，控制解码使用的元素。
+
+```python
+schedule_generator = RandomScheduleGenerator("linear", 0.75, 1000000)


0.75, 1000000 这两个值是怎么选择的，请在 README 中解释，否则，用户很难确定这两个值的设置从何而来。

前面提了下超参数需要用户调优。后面调优后会替换这两个值，并说明这是调优后的结果。

lcy-seso · 2017-05-15T10:19:43Z

scheduled_sampling/README.md

+        indexes = (numbers >= rate).astype('int32').tolist()
+        self.data_processed_ += batch_size
+        return indexes
+```


这样贴一段代码的效果和直接看代码是没啥区别的，请解释怎么使用，初始参数怎么设置。

lcy-seso · 2017-05-15T10:24:13Z

scheduled_sampling/README.md

+        self.data_processed_ += batch_size
+        return indexes
+```
+其中`__init__`方法定义了几种不同的衰减概率，`processBatch`方法根据该概率进行采样，最终确定解码时是使用真实元素还是使用生成的元素。


贴一段代码，在代码后面附上这样一句话是没有有效的解释，和直接看代码是没区别的，作为读者看完这句话是充满了疑惑的。

其中__init__方法定义了几种 --> 定义了几种？这几种怎么选择？参数怎么设置？请有效地与上文介绍进行管理，指代不请。

processBatch方法根据该概率进行采样 --> 该概率指代上一句的__init__里面定义的吗？__init__里面接受超参数，采样概率是如何变化的？

最终确定解码时是使用真实元素还是使用生成的元素。 --> 怎么确定的？

lcy-seso · 2017-05-15T10:27:51Z

scheduled_sampling/README.md

+其中`__init__`方法定义了几种不同的衰减概率，`processBatch`方法根据该概率进行采样，最终确定解码时是使用真实元素还是使用生成的元素。
+
+
+这里对数据reader进行封装，加入从`RandomScheduleGenerator`采样得到的`true_token_flag`作为另一组数据输入，控制解码使用的元素。


这里对数据reader进行封装 --> 请展开多写两句，这句话放在这里，为啥对reader进行封装？请不要让读者去想。。。

控制解码使用的元素。--> 这里并不涉及“解码”过程，通常把生成整个序列称之为解码。

lcy-seso · 2017-05-15T10:47:15Z

请提multiplex layer 的v2 接口的PR，否则这个例子merege 之后无法运行。

luotao1

Readme需要大改：

random_schedule_generator.py不用放在paddle repo下面么，这样别的用户也可以使用 @lcy-seso
算法简介要更加通俗易懂，可以将论文中的图转成中文，结合图进行描述。
模型实现部分目前是大段贴代码，大段贴代码的部分都可以删去，只留下一些关键的文字性描述即可。

luotao1 · 2017-05-15T11:20:47Z

scheduled_sampling/README.md

@@ -1 +1,164 @@
-TBD
+# Scheduled Sampling


标题建议改成中文，下同所有的"Scheduled Sampling"

这个好像还没有标准的中文翻译

这个例子不需要标准的中文翻译，用英文即可。暂时我也没有遇到广泛接受的中文翻译。

luotao1 · 2017-05-15T11:21:21Z

scheduled_sampling/README.md

+
+## 概述
+序列生成任务的训练目标是在给定源输入的条件下，最大化目标序列的概率。训练时该模型将目标序列中的真实元素作为解码阶段每一步的输入，然后最大化下一个元素的概率。生成时上一步解码得到的元素被用作当前的输入，然后生成下一个元素。可见这种情况下训练阶段和生成阶段的解码层输入数据的概率分布并不一致。如果序列前面生成了错误的元素，后面的输入状态将会收到影响，而该误差会随着生成过程不断向后累积。
+Scheduled Sampling是一种解决训练和生成时输入数据分布不一致的方法。在训练早期该方法主要使用真实元素作为解码输入，以将模型从随机初始化的状态快速引导至一个合理的状态。随着训练的进行该方法会逐渐更多的使用生成元素作为解码输入，以解决数据分布不一致的问题。


第4和第5行中间加空行，不然全部连在一起了。

luotao1 · 2017-05-15T11:30:28Z

scheduled_sampling/README.md

+# Scheduled Sampling
+
+## 概述
+序列生成任务的训练目标是在给定源输入的条件下，最大化目标序列的概率。训练时该模型将目标序列中的真实元素作为解码阶段每一步的输入，然后最大化下一个元素的概率。生成时上一步解码得到的元素被用作当前的输入，然后生成下一个元素。可见这种情况下训练阶段和生成阶段的解码层输入数据的概率分布并不一致。如果序列前面生成了错误的元素，后面的输入状态将会收到影响，而该误差会随着生成过程不断向后累积。


如果有训练目标，就应该写生成目标。要不两个目标都可以不写，这儿建议可以全去掉，只讲训练和生成时的不同数据分布情况。

“如果序列前面生成了错误的元素，后面的输入状态将会收到影响，而该误差会随着生成过程不断向后累积。”是引入Scheduled Sampling的原因么？如果不是，可以去掉。

luotao1 · 2017-05-15T11:33:17Z

scheduled_sampling/README.md

+
+## 概述
+序列生成任务的训练目标是在给定源输入的条件下，最大化目标序列的概率。训练时该模型将目标序列中的真实元素作为解码阶段每一步的输入，然后最大化下一个元素的概率。生成时上一步解码得到的元素被用作当前的输入，然后生成下一个元素。可见这种情况下训练阶段和生成阶段的解码层输入数据的概率分布并不一致。如果序列前面生成了错误的元素，后面的输入状态将会收到影响，而该误差会随着生成过程不断向后累积。
+Scheduled Sampling是一种解决训练和生成时输入数据分布不一致的方法。在训练早期该方法主要使用真实元素作为解码输入，以将模型从随机初始化的状态快速引导至一个合理的状态。随着训练的进行该方法会逐渐更多的使用生成元素作为解码输入，以解决数据分布不一致的问题。


在训练早期该方法主要使用真实元素作为解码输入：真实元素应该是目标序列的真实元素

以将-》可以将

随着训练的进行，该方法XXX （全文注意分句）

luotao1 · 2017-05-15T11:33:59Z

scheduled_sampling/README.md

+Scheduled Sampling是一种解决训练和生成时输入数据分布不一致的方法。在训练早期该方法主要使用真实元素作为解码输入，以将模型从随机初始化的状态快速引导至一个合理的状态。随着训练的进行该方法会逐渐更多的使用生成元素作为解码输入，以解决数据分布不一致的问题。
+
+## 算法简介
+Scheduled Sampling主要应用在Sequence to Sequence模型的训练上，而生成阶段则不需要使用。


主要应用在序列到序列模型的训练阶段，生成阶段不需要使用。

Sequence to Sequence 改成 “序列到序列”，下同。

luotao1 · 2017-05-15T11:36:37Z

scheduled_sampling/README.md

+序列生成任务的训练目标是在给定源输入的条件下，最大化目标序列的概率。训练时该模型将目标序列中的真实元素作为解码阶段每一步的输入，然后最大化下一个元素的概率。生成时上一步解码得到的元素被用作当前的输入，然后生成下一个元素。可见这种情况下训练阶段和生成阶段的解码层输入数据的概率分布并不一致。如果序列前面生成了错误的元素，后面的输入状态将会收到影响，而该误差会随着生成过程不断向后累积。
+Scheduled Sampling是一种解决训练和生成时输入数据分布不一致的方法。在训练早期该方法主要使用真实元素作为解码输入，以将模型从随机初始化的状态快速引导至一个合理的状态。随着训练的进行该方法会逐渐更多的使用生成元素作为解码输入，以解决数据分布不一致的问题。
+
+## 算法简介


算法简介最好有图，现在的描述方式，小白用户看的很晕。

lcy-seso · 2017-06-14T23:52:48Z

If you are not going to finish this pr, please tell me. I will do it myself.

wwhu · 2017-06-15T00:37:34Z

I will finish it ASAP. Sorry for the delay. @lcy-seso

lcy-seso · 2017-06-15T00:41:10Z

@wwhu You're welcome. I think the work is almost finished, only after small modifications, we can merge it first, and then keep on refining it. Thanks for your work.

wwhu · 2017-06-15T11:28:26Z

@lcy-seso @luotao1 已按照上面的意见修改README和其他一些小问题

lcy-seso

Need to fix a small bug due to the updates of PaddlePaddle.

lcy-seso · 2017-06-16T07:54:48Z

scheduled_sampling/README.md

@@ -1 +1,164 @@
-TBD
+# Scheduled Sampling


这个例子不需要标准的中文翻译，用英文即可。暂时我也没有遇到广泛接受的中文翻译。

lcy-seso · 2017-06-16T07:56:50Z

scheduled_sampling/scheduled_sampling.py

+
+        return cost
+    else:
+        trg_embedding = paddle.layer.GeneratedInputV2(


这里有一个小问题，paddle 最近升级了， GeneratedInputV2 和 StaticInputV2 两个函数的 V2 后缀都不再需要。全部替换为 GeneratedInput和StaticInput吧。否则会报错。

lcy-seso

Almost LGTM, I will further refactor and validate this demo.

update

wwhu added 6 commits May 5, 2017 14:53

add code

2a4a976

bug fix

641d0e7

bug fix

01f506b

bug fix

6b4d274

bug fix

3bd88f6

bug fix

363b62d

wwhu changed the title ~~Ss dev~~ May 8, 2017

correct the code style

bb93d5c

luotao1 requested changes May 9, 2017

View reviewed changes

lcy-seso requested changes May 9, 2017

View reviewed changes

lcy-seso self-assigned this May 9, 2017

adjust some comments

b1ab60d

wwhu added 3 commits May 10, 2017 15:55

add copyright

6f4ea4b

remove copyright

dbf7106

Merge branch 'develop' of https://github.com/PaddlePaddle/models into…

636caa9

… ss-dev update

lcy-seso requested changes May 11, 2017

View reviewed changes

add doc

1fd8161

lcy-seso requested changes May 15, 2017

View reviewed changes

luotao1 requested changes May 15, 2017

View reviewed changes

wwhu added 3 commits June 15, 2017 19:13

revise README and some small problems

5944782

revise README and some small problems

c4beb35

resize figure

8a30af9

lcy-seso requested changes Jun 16, 2017

View reviewed changes

remove V2 postfix

4a18101

lcy-seso approved these changes Jun 19, 2017

View reviewed changes

luotao1 approved these changes Jun 19, 2017

View reviewed changes

lcy-seso merged commit 82e8848 into PaddlePaddle:develop Jun 19, 2017

wwhu deleted the ss-dev branch June 19, 2017 08:38

HongyuLi2018 pushed a commit that referenced this pull request Apr 25, 2019

Merge pull request #29 from PaddlePaddle/develop

17364eb

update

		return data_reader


		def seqToseq_net(source_dict_dim, target_dict_dim, is_generating=False):


		return out

		def gru_decoder_with_attention_test(enc_vec, enc_proj, current_word):

		其中`__init__`方法定义了几种不同的衰减概率，`processBatch`方法根据该概率进行采样，最终确定解码时是使用真实元素还是使用生成的元素。


		这里对数据reader进行封装，加入从`RandomScheduleGenerator`采样得到的`true_token_flag`作为另一组数据输入，控制解码使用的元素。

Conversation

wwhu commented May 8, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lcy-seso left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wwhu commented May 10, 2017

lcy-seso left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wwhu commented May 15, 2017

lcy-seso left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lcy-seso commented May 15, 2017

luotao1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lcy-seso commented Jun 14, 2017

wwhu commented Jun 15, 2017

lcy-seso commented Jun 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

wwhu commented Jun 15, 2017

lcy-seso left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lcy-seso left a comment

Choose a reason for hiding this comment

Labels

3 participants

lcy-seso commented Jun 15, 2017 •

edited

Loading