Rewrite REAME.md for DS2 and update examples. by xinghai-sun · Pull Request #246 · PaddlePaddle/models

xinghai-sun · 2017-09-12T06:52:35Z

Resolve #245 and #235

luotao1

Readme.md文档逻辑很清晰，但英文语法上面存在挺多小问题，需要之后再refine一下。
另外，其他models下的readme都为中文，deep speech2也放一个中文的介绍吧，有助于帮助国内用户尽快上手，且方便大家贡献。

luotao1 · 2017-09-12T08:58:28Z

deep_speech_2/README.md

+
+## Prerequisites
+- Only support Python 2.7
+- PaddlePaddle the latest version (please refer to the [Installation Guide](https://github.com/PaddlePaddle/Paddle#installation))


PaddlePaddle: the latest version

这是临时写的。由于目前正式发布的版本都不适用ds2，无法写版本号，有什么建议吗？写commit hash可能更不合适。
此外，我们可以额外提供一个包含ds依赖的 docker image。

kuke

Some minor problems, almost LGTM

kuke · 2017-09-12T07:07:12Z

deep_speech_2/test.py

        "otherwise, it resumes from the pre-trained model.")
 add_arg('lang_model_path',  str,
-        'lm/data/common_crawl_00.prune01111.trie.klm',
+        'model_zoo/lm/common_crawl_00.prune01111.trie.klm',


model_zoo->models?

kuke · 2017-09-12T07:11:38Z

deep_speech_2/tools/tune.py

        "Filepath of vocabulary.")
 add_arg('lang_model_path',  str,
-        'lm/data/common_crawl_00.prune01111.trie.klm',
+        'model_zoo/lm/common_crawl_00.prune01111.trie.klm',


kuke · 2017-09-12T07:20:28Z

deep_speech_2/infer.py

        "Filepath of vocabulary.")
 add_arg('lang_model_path',  str,
-        'lm/data/common_crawl_00.prune01111.trie.klm',
+        'model_zoo/lm/common_crawl_00.prune01111.trie.klm',


kuke · 2017-09-12T07:24:38Z

deep_speech_2/examples/librispeech/run_infer.sh

+--infer_manifest='data/librispeech/manifest.test-clean' \
+--mean_std_path='data/librispeech/mean_std.npz' \
+--vocab_path='data/librispeech/vocab.txt' \
+--model_path='checkpoints/libri/params.latest.tar.gz' \


In some other files, this path is 'checkpoints/librispeech/*. Please unify them.

Done. Unified with 'checkpoints/libri/*'

kuke · 2017-09-12T07:35:30Z

deep_speech_2/README.md

+Several shell scripts provided in `./examples` will help us to quickly give it a try, for most major modules, including data preparation, model training, case inference and model evaluation, with a few public dataset (e.g. [LibriSpeech](http://www.openslr.org/12/), [Aishell](https://github.com/kaldi-asr/kaldi/tree/master/egs/aishell)). Reading these examples will also help us understand how to make it work with our own data.

-### Preparing Data
+Some of the scripts in `./examples` are configured with 8 GPUs. If you don't have 8 GPUs available, please modify `CUDA_VISIBLE_DEVICE` and `--trainer_count`. If you don't have any GPU available, please set `--use_gpu` to False to use CPUs instead.


CUDA_VISIBLE_DEVICE -> CUDA_VISIBLE_DEVICES

kuke · 2017-09-12T10:05:44Z

deep_speech_2/README.md

+- [Questions and Help](#questions-and-help)
+
+## Prerequisites
+- Only support Python 2.7


Python 2.7 only supported

kuke · 2017-09-12T10:06:52Z

deep_speech_2/README.md

-### Preparing for Training
+To use your custom data, you only need to generate such manifest files to summarize the dataset. Given such summarized manifests, training, inference and all other modules can be aware of where to access the audio files, as well as their meta data including the transcription labels.
+
+For how to generate such manifest files, please refer to `data/librispeech/librispeech.py`, which download and generate manifests for LibriSpeech dataset.


downloads & generates

--> will download and generate

pkuyym · 2017-09-12T09:54:39Z

deep_speech_2/README.md

 # DeepSpeech2 on PaddlePaddle

->TODO: to be updated, since the directory hierarchy was changed.
+*DeepSpeech2 on PaddlePaddle* is an open-source implementation of end-to-end Automatic Speech Recognition (ASR) engine, based on [Baidu's Deep Speech 2 paper](http://proceedings.mlr.press/v48/amodei16.pdf), with [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) platform. Our vision is to empower both industrial application and academic research on speech recognition, via an easy-to-use, efficient and scalable implementation, including training, inferencing & testing module, distributed [PaddleCloud](https://github.com/PaddlePaddle/cloud) training, and demo deployment. Besides, several pre-trained models for both English and Mandarin are also released.


training, inferencing --> training, inference?

pkuyym · 2017-09-12T09:59:11Z

deep_speech_2/README.md


 ## Installation

+Please install the [prerequisites](#prerequisites) above before moving on.


--> Please make sure above prerequisites has been satisfied before moving on.
install prerequisites may be not proper.

pkuyym · 2017-09-12T10:01:03Z

deep_speech_2/README.md

+## Getting Started

-## Usage
+Several shell scripts provided in `./examples` will help us to quickly give it a try, for most major modules, including data preparation, model training, case inference and model evaluation, with a few public dataset (e.g. [LibriSpeech](http://www.openslr.org/12/), [Aishell](https://github.com/kaldi-asr/kaldi/tree/master/egs/aishell)). Reading these examples will also help us understand how to make it work with our own data.


us --> you
https://github.com/kaldi-asr/kaldi/tree/master/egs/aishell --> http://www.openslr.org/33
our --> your

pkuyym · 2017-09-12T10:04:10Z

deep_speech_2/README.md

+    sh run_data.sh
+    ```
+
+    `run_data.sh` will download dataset, generate manifests, collect normalizer' statistics and build vocabulary. Once the data preparation is done, we will find the data (only part of LibriSpeech) downloaded in `~/.cache/paddle/dataset/speech/libri` and the corresponding manifest files generated in `./data/tiny` as well as a mean stddev file and a vocabulary file. It has to be run for the very first time we run this dataset and is reusable for all further experiments.


normalizer' --> normalizer
we --> you

normalizer's
done.

pkuyym · 2017-09-12T10:07:10Z

deep_speech_2/README.md

+    sh run_train.sh
+    ```
+
+    `run_train.sh` will start a training job, with training logs printed to stdout and model checkpoint of every pass/epoch saved to `./checkpoints/tiny`. We can resume the training from these checkpoints, or use them for inference, evaluation and deployment.


resume the training is not rigorous.

"resume training" is ok, e.g. https://cn.mathworks.com/help/nnet/ug/resume-training-from-a-checkpoint-network.html

pkuyym · 2017-09-12T10:10:07Z

deep_speech_2/README.md

-### Preparing for Training
+To use your custom data, you only need to generate such manifest files to summarize the dataset. Given such summarized manifests, training, inference and all other modules can be aware of where to access the audio files, as well as their meta data including the transcription labels.
+
+For how to generate such manifest files, please refer to `data/librispeech/librispeech.py`, which download and generate manifests for LibriSpeech dataset.


download --> downloads
generate --> generates
manifests --> manifest files

pkuyym · 2017-09-12T10:12:41Z

deep_speech_2/README.md

 ```
 python train.py --help
 ```
+or refer to `example/librispeech/run_train.sh`.


run_train.sh doesn't include all arguments like init_model_path

pkuyym · 2017-09-12T10:15:06Z

deep_speech_2/README.md

-sh run.sh
-cd ..
-```
+Six optional augmentation components are provided for us to configured and inserted into the processing pipeline.


Six optional augmentation components are provided which can be configured and inserted into the processing pipeline

--> Six optional augmentation components are provided to be selected, configured and inserted into the processing pipeline.

pkuyym · 2017-09-12T10:25:14Z

deep_speech_2/README.md

 ```
-And then in another console, start the demo's client:
+
+Now, in the client console, press the `whitespace` key, hold, and start speaking. Until we finish our utterance, we release the key to let the speech-to-text results shown in the console. To quit the client, just press `ESC` key.


Here, we can paste the information message after starting client console.

pkuyym · 2017-09-12T10:27:17Z

deep_speech_2/README.md

+
+Now, in the client console, press the `whitespace` key, hold, and start speaking. Until we finish our utterance, we release the key to let the speech-to-text results shown in the console. To quit the client, just press `ESC` key.
+
+Notice that `deploy/demo_client.py` must be run in a machine with a microphone device, while `deploy/demo_server.py` could be run in one without any audio recording hardware, e.g. any remote server machine. Just be careful to set the `host_ip` and `host_port` argument with the actual accessible IP address and port, if the server and client are running with two separate machines. Nothing should be done if they are running in one single machine.


kuke

LGTM

pkuyym · 2017-09-13T09:13:44Z

deep_speech_2/data/librispeech/librispeech.py

    if not (os.path.exists(filepath) and md5file(filepath) == md5sum):
        print("Downloading %s ..." % url)
-        os.system("wget -c " + url + " -P " + target_dir)
+        ret = os.system("wget -c " + url + " -P " + target_dir)


remove ret

pkuyym · 2017-09-13T09:27:54Z

deep_speech_2/README.md

+Several shell scripts provided in `./examples` will help us to quickly give it a try, for most major modules, including data preparation, model training, case inference and model evaluation, with a few public dataset (e.g. [LibriSpeech](http://www.openslr.org/12/), [Aishell](http://www.openslr.org/33)). Reading these examples will also help you to understand how to make it work with your own data.

-### Preparing Data
+Some of the scripts in `./examples` are configured with 8 GPUs. If you don't have 8 GPUs available, please modify `CUDA_VISIBLE_DEVICES` and `--trainer_count`. If you don't have any GPU available, please set `--use_gpu` to False to use CPUs instead.


Prompt OOM problem.

pkuyym

LGTM

xinghai-sun added 7 commits September 7, 2017 11:26

Re-design README.md doc structure and add table of contents.

f68e7b4

Rewrite README.md doc (50%) and correct some bugs.

5c72c07

Rename some folders and update examples.

d28ee3f

Update examples scripts and REAME.md for DS2.

f329ecd

Update REAME.md for DS2.

edb809b

Correct typos for DS2 README.md.

6fc06dd

Merge branch 'develop' into doc2

fa77231

xinghai-sun requested review from kuke, lcy-seso, luotao1, pkuyym and wangkuiyi September 12, 2017 06:54

luotao1 reviewed Sep 12, 2017

View reviewed changes

kuke reviewed Sep 12, 2017

View reviewed changes

pkuyym reviewed Sep 12, 2017

View reviewed changes

xinghai-sun added 2 commits September 12, 2017 23:47

Add bash code highlight to README.md for DS2.

c6003c1

Update READMD.md and other details by following reviewers comments.

85de329

kuke reviewed Sep 13, 2017

View reviewed changes

pkuyym reviewed Sep 13, 2017

View reviewed changes

Update RAEDME.md and librispeech.py by following Yaming's review.

f071bc8

pkuyym approved these changes Sep 13, 2017

View reviewed changes

xinghai-sun merged commit 848bb8a into PaddlePaddle:develop Sep 13, 2017

xinghai-sun deleted the doc2 branch September 13, 2017 09:41


		## Installation

		Please install the [prerequisites](#prerequisites) above before moving on.


		Now, in the client console, press the `whitespace` key, hold, and start speaking. Until we finish our utterance, we release the key to let the speech-to-text results shown in the console. To quit the client, just press `ESC` key.

		Notice that `deploy/demo_client.py` must be run in a machine with a microphone device, while `deploy/demo_server.py` could be run in one without any audio recording hardware, e.g. any remote server machine. Just be careful to set the `host_ip` and `host_port` argument with the actual accessible IP address and port, if the server and client are running with two separate machines. Nothing should be done if they are running in one single machine.

Conversation

xinghai-sun commented Sep 12, 2017

luotao1 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kuke left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

xinghai-sun Sep 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kuke left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pkuyym left a comment

Choose a reason for hiding this comment

Labels

4 participants

xinghai-sun Sep 13, 2017 •

edited

Loading