Add audio data provider and a simplified DeepSpeech2 model configuration.#55
Add audio data provider and a simplified DeepSpeech2 model configuration.#55lcy-seso merged 7 commits intoPaddlePaddle:developfrom
Conversation
…2 model configuration. Bug exists when run training.
| feeding=feeding) | ||
| args.num_passes -= 1 | ||
| # other passes without sortagrad | ||
| trainer.train( |
There was a problem hiding this comment.
If args.use_sortagrad is true, trainer.train will be called twice. However, the second trainer.train call will get stuck (no progress, no error). trainer.train does not support multiple function calls ?
There was a problem hiding this comment.
I think more elaborate controlling interfaces of training process need to be exposed to users for more flexible training flow control. E.g. in this DS2 case, training data needs to be changed during training. In other cases, parts of model needs freezing for a while or being trained alternatively (e.g. GAN).
deep_speech_2/model.py
Outdated
|
|
||
| forward = paddle.layer.recurrent_group( | ||
| step=__simple_rnn_step__, input=input) | ||
| return forward |
There was a problem hiding this comment.
This has been fixed. Need to be updated.
deep_speech_2/train.py
Outdated
| if isinstance(event, paddle.event.EndPass): | ||
| result = trainer.test(reader=test_batch_reader, feeding=feeding) | ||
| print "Pass: %d, TestCost: %s" % (event.pass_id, result.cost) | ||
| with gzip.open("params.tar.gz", 'w') as f: |
There was a problem hiding this comment.
Save the trained model according to the pass number, otherwise, the later saved model overwrites the former ones.
There was a problem hiding this comment.
Since overfitting rarely happens in DS2, it is not necessary to save multiple models with the pass index. Currently the latest one will be enough. Might add it in the future if necessary.
| size=dict_size + 1, | ||
| blank=dict_size, | ||
| norm_by_times=True) | ||
| # max decoder |
There was a problem hiding this comment.
If max_id is not needed in training, I think it should be put into a testing branch.
There was a problem hiding this comment.
Will refactor this part later, together with beam search decoder.
| rnn_size=args.rnn_layer_size) | ||
|
|
||
| # load parameters | ||
| parameters = paddle.parameters.Parameters.from_tar( |
There was a problem hiding this comment.
Save / Load models according to the pass index.
There was a problem hiding this comment.
Since overfitting rarely happens in DS2, it is not necessary to save multiple models with the pass index. Currently the latest one will be enough. Might add it in the future if necessary.
deep_speech_2/librispeech.py
Outdated
|
|
||
| URL_TEST = "http://www.openslr.org/resources/12/test-clean.tar.gz" | ||
| URL_DEV = "http://www.openslr.org/resources/12/dev-clean.tar.gz" | ||
| URL_TRAIN = "http://www.openslr.org/resources/12/train-clean-100.tar.gz" |
deep_speech_2/audio_data_utils.py
Outdated
|
|
||
| class DataGenerator(object): | ||
| """ | ||
| DataGenerator provides basic audio data preprocessing pipeline, and offer |
| return target_dir | ||
|
|
||
|
|
||
| def create_manifest(data_dir, manifest_path): |
There was a problem hiding this comment.
What manifest mean? Add some comments.
| fc = paddle.layer.fc( | ||
| input=rnn_group_output, | ||
| size=dict_size + 1, | ||
| act=paddle.activation.Linear(), |
There was a problem hiding this comment.
Note, the activation should be softmax in inference mode.
There was a problem hiding this comment.
Current code only contains the Best Path Decoder, which do not require a softmax activation.
deep_speech_2/requirements.sh
Outdated
| pip install soundfile | ||
|
|
||
| # For Ubuntu only | ||
| apt-get install libsndfile1 |
There was a problem hiding this comment.
Need to explain in the document.
There was a problem hiding this comment.
python部分可以改成提供一个 requires.txt
| stride_ms=10.0, | ||
| window_ms=20.0, | ||
| max_frequency=None): | ||
| self.__max_duration__ = max_duration |
There was a problem hiding this comment.
Python里面最好不要使用__XX__来定义自己的函数和变量,因为__init__、__del__等��python内置的命名方式,下同。如果定义私有的函数或变量只在前面加下划线就可以,印象中__XX是私有的,子类无法访问,_XX也是私有的但是子类可访问
There was a problem hiding this comment.
目前参考paddle相关python规范如此,建议和大家讨论之后再改?
| self.__stride_ms__ = stride_ms | ||
| self.__window_ms__ = window_ms | ||
| self.__max_frequency__ = max_frequency | ||
| self.__random__ = random.Random(RANDOM_SEED) |
There was a problem hiding this comment.
RANDOM_SEED 不重要,不建议开放。
| norm_by_times=True) | ||
| # max decoder | ||
| max_id = paddle.layer.max_id(input=fc) | ||
| return cost, max_id |
There was a problem hiding this comment.
可以根据训练或者预测分别返回cost或max_id,如果是预测的话,cost应该是非必须的吧
There was a problem hiding this comment.
Will refactor this part later when merging with beam search decoder.
deep_speech_2/requirements.sh
Outdated
| pip install soundfile | ||
|
|
||
| # For Ubuntu only | ||
| apt-get install libsndfile1 |
There was a problem hiding this comment.
python部分可以改成提供一个 requires.txt
| "--use_gpu", default=True, type=bool, help="Use gpu or not.") | ||
| parser.add_argument( | ||
| "--use_sortagrad", default=False, type=bool, help="Use sortagrad or not.") | ||
| parser.add_argument( |
There was a problem hiding this comment.
第20行已经定义了"--trainer",help信息相同,是否重复定义?
xinghai-sun
left a comment
There was a problem hiding this comment.
Done. Thanks for the review!
| rnn_size=args.rnn_layer_size) | ||
|
|
||
| # load parameters | ||
| parameters = paddle.parameters.Parameters.from_tar( |
There was a problem hiding this comment.
Since overfitting rarely happens in DS2, it is not necessary to save multiple models with the pass index. Currently the latest one will be enough. Might add it in the future if necessary.
| return target_dir | ||
|
|
||
|
|
||
| def create_manifest(data_dir, manifest_path): |
deep_speech_2/librispeech.py
Outdated
|
|
||
| URL_TEST = "http://www.openslr.org/resources/12/test-clean.tar.gz" | ||
| URL_DEV = "http://www.openslr.org/resources/12/dev-clean.tar.gz" | ||
| URL_TRAIN = "http://www.openslr.org/resources/12/train-clean-100.tar.gz" |
deep_speech_2/model.py
Outdated
|
|
||
| forward = paddle.layer.recurrent_group( | ||
| step=__simple_rnn_step__, input=input) | ||
| return forward |
| fc = paddle.layer.fc( | ||
| input=rnn_group_output, | ||
| size=dict_size + 1, | ||
| act=paddle.activation.Linear(), |
There was a problem hiding this comment.
Current code only contains the Best Path Decoder, which do not require a softmax activation.
| size=dict_size + 1, | ||
| blank=dict_size, | ||
| norm_by_times=True) | ||
| # max decoder |
There was a problem hiding this comment.
Will refactor this part later, together with beam search decoder.
deep_speech_2/requirements.sh
Outdated
| pip install soundfile | ||
|
|
||
| # For Ubuntu only | ||
| apt-get install libsndfile1 |
deep_speech_2/train.py
Outdated
| if isinstance(event, paddle.event.EndPass): | ||
| result = trainer.test(reader=test_batch_reader, feeding=feeding) | ||
| print "Pass: %d, TestCost: %s" % (event.pass_id, result.cost) | ||
| with gzip.open("params.tar.gz", 'w') as f: |
There was a problem hiding this comment.
Since overfitting rarely happens in DS2, it is not necessary to save multiple models with the pass index. Currently the latest one will be enough. Might add it in the future if necessary.
xinghai-sun
left a comment
There was a problem hiding this comment.
Thanks for the review.
deep_speech_2/audio_data_utils.py
Outdated
|
|
||
| class DataGenerator(object): | ||
| """ | ||
| DataGenerator provides basic audio data preprocessing pipeline, and offer |
| stride_ms=10.0, | ||
| window_ms=20.0, | ||
| max_frequency=None): | ||
| self.__max_duration__ = max_duration |
There was a problem hiding this comment.
目前参考paddle相关python规范如此,建议和大家讨论之后再改?
| self.__stride_ms__ = stride_ms | ||
| self.__window_ms__ = window_ms | ||
| self.__max_frequency__ = max_frequency | ||
| self.__random__ = random.Random(RANDOM_SEED) |
There was a problem hiding this comment.
RANDOM_SEED 不重要,不建议开放。
| norm_by_times=True) | ||
| # max decoder | ||
| max_id = paddle.layer.max_id(input=fc) | ||
| return cost, max_id |
There was a problem hiding this comment.
Will refactor this part later when merging with beam search decoder.
2. Fix incorrect batch-norm usage in RNN. 3. Fix overlapping train/dev/test manfests. 4. Update README.md and requirements.txt. 5. Expose more arguments to users in argparser. 6. Update all other details.
|
Now the model can run smoothly, with a good convergence and reasonable decoding results. |
resolved issue 2226
resolved issue 2231