Rewrite REAME.md for DS2 and update examples.#246
Rewrite REAME.md for DS2 and update examples.#246xinghai-sun merged 10 commits intoPaddlePaddle:developfrom
Conversation
luotao1
left a comment
There was a problem hiding this comment.
- Readme.md文档逻辑很清晰,但英文语法上面存在挺多小问题,需要之后再refine一下。
- 另外,其他models下的readme都为中文,deep speech2也放一个中文的介绍吧,有助于帮助国内用户尽快上手,且方便大家贡献。
|
|
||
| ## Prerequisites | ||
| - Only support Python 2.7 | ||
| - PaddlePaddle the latest version (please refer to the [Installation Guide](https://github.com/PaddlePaddle/Paddle#installation)) |
There was a problem hiding this comment.
PaddlePaddle: the latest version
There was a problem hiding this comment.
这是临时写的。由于目前正式发布的版本都不适用ds2,无法写版本号,有什么建议吗?写commit hash可能更不合适。
此外,我们可以额外提供一个包含ds依赖的 docker image。
kuke
left a comment
There was a problem hiding this comment.
Some minor problems, almost LGTM
deep_speech_2/test.py
Outdated
| "otherwise, it resumes from the pre-trained model.") | ||
| add_arg('lang_model_path', str, | ||
| 'lm/data/common_crawl_00.prune01111.trie.klm', | ||
| 'model_zoo/lm/common_crawl_00.prune01111.trie.klm', |
deep_speech_2/tools/tune.py
Outdated
| "Filepath of vocabulary.") | ||
| add_arg('lang_model_path', str, | ||
| 'lm/data/common_crawl_00.prune01111.trie.klm', | ||
| 'model_zoo/lm/common_crawl_00.prune01111.trie.klm', |
deep_speech_2/infer.py
Outdated
| "Filepath of vocabulary.") | ||
| add_arg('lang_model_path', str, | ||
| 'lm/data/common_crawl_00.prune01111.trie.klm', | ||
| 'model_zoo/lm/common_crawl_00.prune01111.trie.klm', |
| --infer_manifest='data/librispeech/manifest.test-clean' \ | ||
| --mean_std_path='data/librispeech/mean_std.npz' \ | ||
| --vocab_path='data/librispeech/vocab.txt' \ | ||
| --model_path='checkpoints/libri/params.latest.tar.gz' \ |
There was a problem hiding this comment.
In some other files, this path is 'checkpoints/librispeech/*. Please unify them.
There was a problem hiding this comment.
Done. Unified with 'checkpoints/libri/*'
deep_speech_2/README.md
Outdated
| Several shell scripts provided in `./examples` will help us to quickly give it a try, for most major modules, including data preparation, model training, case inference and model evaluation, with a few public dataset (e.g. [LibriSpeech](http://www.openslr.org/12/), [Aishell](https://github.com/kaldi-asr/kaldi/tree/master/egs/aishell)). Reading these examples will also help us understand how to make it work with our own data. | ||
|
|
||
| ### Preparing Data | ||
| Some of the scripts in `./examples` are configured with 8 GPUs. If you don't have 8 GPUs available, please modify `CUDA_VISIBLE_DEVICE` and `--trainer_count`. If you don't have any GPU available, please set `--use_gpu` to False to use CPUs instead. |
There was a problem hiding this comment.
CUDA_VISIBLE_DEVICE -> CUDA_VISIBLE_DEVICES
deep_speech_2/README.md
Outdated
| - [Questions and Help](#questions-and-help) | ||
|
|
||
| ## Prerequisites | ||
| - Only support Python 2.7 |
deep_speech_2/README.md
Outdated
| ### Preparing for Training | ||
| To use your custom data, you only need to generate such manifest files to summarize the dataset. Given such summarized manifests, training, inference and all other modules can be aware of where to access the audio files, as well as their meta data including the transcription labels. | ||
|
|
||
| For how to generate such manifest files, please refer to `data/librispeech/librispeech.py`, which download and generate manifests for LibriSpeech dataset. |
There was a problem hiding this comment.
--> will download and generate
deep_speech_2/README.md
Outdated
| # DeepSpeech2 on PaddlePaddle | ||
|
|
||
| >TODO: to be updated, since the directory hierarchy was changed. | ||
| *DeepSpeech2 on PaddlePaddle* is an open-source implementation of end-to-end Automatic Speech Recognition (ASR) engine, based on [Baidu's Deep Speech 2 paper](http://proceedings.mlr.press/v48/amodei16.pdf), with [PaddlePaddle](https://github.com/PaddlePaddle/Paddle) platform. Our vision is to empower both industrial application and academic research on speech recognition, via an easy-to-use, efficient and scalable implementation, including training, inferencing & testing module, distributed [PaddleCloud](https://github.com/PaddlePaddle/cloud) training, and demo deployment. Besides, several pre-trained models for both English and Mandarin are also released. |
There was a problem hiding this comment.
training, inferencing --> training, inference?
deep_speech_2/README.md
Outdated
|
|
||
| ## Installation | ||
|
|
||
| Please install the [prerequisites](#prerequisites) above before moving on. |
There was a problem hiding this comment.
--> Please make sure above prerequisites has been satisfied before moving on.
install prerequisites may be not proper.
deep_speech_2/README.md
Outdated
| ## Getting Started | ||
|
|
||
| ## Usage | ||
| Several shell scripts provided in `./examples` will help us to quickly give it a try, for most major modules, including data preparation, model training, case inference and model evaluation, with a few public dataset (e.g. [LibriSpeech](http://www.openslr.org/12/), [Aishell](https://github.com/kaldi-asr/kaldi/tree/master/egs/aishell)). Reading these examples will also help us understand how to make it work with our own data. |
There was a problem hiding this comment.
us --> you
https://github.com/kaldi-asr/kaldi/tree/master/egs/aishell --> http://www.openslr.org/33
our --> your
deep_speech_2/README.md
Outdated
| sh run_data.sh | ||
| ``` | ||
|
|
||
| `run_data.sh` will download dataset, generate manifests, collect normalizer' statistics and build vocabulary. Once the data preparation is done, we will find the data (only part of LibriSpeech) downloaded in `~/.cache/paddle/dataset/speech/libri` and the corresponding manifest files generated in `./data/tiny` as well as a mean stddev file and a vocabulary file. It has to be run for the very first time we run this dataset and is reusable for all further experiments. |
There was a problem hiding this comment.
normalizer' --> normalizer
we --> you
There was a problem hiding this comment.
normalizer's
done.
deep_speech_2/README.md
Outdated
| sh run_train.sh | ||
| ``` | ||
|
|
||
| `run_train.sh` will start a training job, with training logs printed to stdout and model checkpoint of every pass/epoch saved to `./checkpoints/tiny`. We can resume the training from these checkpoints, or use them for inference, evaluation and deployment. |
There was a problem hiding this comment.
resume the training is not rigorous.
There was a problem hiding this comment.
"resume training" is ok, e.g. https://cn.mathworks.com/help/nnet/ug/resume-training-from-a-checkpoint-network.html
deep_speech_2/README.md
Outdated
| ### Preparing for Training | ||
| To use your custom data, you only need to generate such manifest files to summarize the dataset. Given such summarized manifests, training, inference and all other modules can be aware of where to access the audio files, as well as their meta data including the transcription labels. | ||
|
|
||
| For how to generate such manifest files, please refer to `data/librispeech/librispeech.py`, which download and generate manifests for LibriSpeech dataset. |
There was a problem hiding this comment.
download --> downloads
generate --> generates
manifests --> manifest files
| ``` | ||
| python train.py --help | ||
| ``` | ||
| or refer to `example/librispeech/run_train.sh`. |
There was a problem hiding this comment.
run_train.sh doesn't include all arguments like init_model_path
deep_speech_2/README.md
Outdated
| sh run.sh | ||
| cd .. | ||
| ``` | ||
| Six optional augmentation components are provided for us to configured and inserted into the processing pipeline. |
There was a problem hiding this comment.
Six optional augmentation components are provided which can be configured and inserted into the processing pipeline
There was a problem hiding this comment.
--> Six optional augmentation components are provided to be selected, configured and inserted into the processing pipeline.
deep_speech_2/README.md
Outdated
| ``` | ||
| And then in another console, start the demo's client: | ||
|
|
||
| Now, in the client console, press the `whitespace` key, hold, and start speaking. Until we finish our utterance, we release the key to let the speech-to-text results shown in the console. To quit the client, just press `ESC` key. |
There was a problem hiding this comment.
Here, we can paste the information message after starting client console.
deep_speech_2/README.md
Outdated
|
|
||
| Now, in the client console, press the `whitespace` key, hold, and start speaking. Until we finish our utterance, we release the key to let the speech-to-text results shown in the console. To quit the client, just press `ESC` key. | ||
|
|
||
| Notice that `deploy/demo_client.py` must be run in a machine with a microphone device, while `deploy/demo_server.py` could be run in one without any audio recording hardware, e.g. any remote server machine. Just be careful to set the `host_ip` and `host_port` argument with the actual accessible IP address and port, if the server and client are running with two separate machines. Nothing should be done if they are running in one single machine. |
| if not (os.path.exists(filepath) and md5file(filepath) == md5sum): | ||
| print("Downloading %s ..." % url) | ||
| os.system("wget -c " + url + " -P " + target_dir) | ||
| ret = os.system("wget -c " + url + " -P " + target_dir) |
deep_speech_2/README.md
Outdated
| Several shell scripts provided in `./examples` will help us to quickly give it a try, for most major modules, including data preparation, model training, case inference and model evaluation, with a few public dataset (e.g. [LibriSpeech](http://www.openslr.org/12/), [Aishell](http://www.openslr.org/33)). Reading these examples will also help you to understand how to make it work with your own data. | ||
|
|
||
| ### Preparing Data | ||
| Some of the scripts in `./examples` are configured with 8 GPUs. If you don't have 8 GPUs available, please modify `CUDA_VISIBLE_DEVICES` and `--trainer_count`. If you don't have any GPU available, please set `--use_gpu` to False to use CPUs instead. |
Resolve #245 and #235