Skip to content

Commit b6e79b4

Browse files
authored
Update README.md
1 parent 73464b1 commit b6e79b4

File tree

1 file changed

+14
-5
lines changed

1 file changed

+14
-5
lines changed

‎README.md‎

Lines changed: 14 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ Vietnamese-Speech-Recognition
33

44
# Introduction
55

6-
In this repo, I focused on building end-to-end speech recognition pipeline using [Quartznet](https://arxiv.org/abs/1910.10261) and [CTC decoder](https://github.com/parlance/ctcdecode) supported by beam search algorithm as well as language model.
6+
In this repo, I focused on building end-to-end speech recognition pipeline using [Quartznet](https://arxiv.org/abs/1910.10261), [wav2vec2.0](https://arxiv.org/abs/2006.11477) and [CTC decoder](https://github.com/parlance/ctcdecode) supported by beam search algorithm as well as language model.
77

88
# Setup
99

@@ -13,7 +13,7 @@ Here I used [100h speech public dataset](https://institute.vinbigdata.org/events
1313

1414
```
1515
mkdir data/LJSpeech-1.1
16-
python data/custom.py
16+
python data/custom.py # create data format for training quartnet & w2v2.0
1717
```
1818

1919
And below is the folder that I used, note that `metadata.csv` has 2 columns, `file name` and `transcript`:
@@ -61,12 +61,18 @@ For training the quartznet model, you can run:
6161
python3 tools/train.py --config configs/config.yaml
6262
```
6363

64-
And evaludate:
64+
And evaludate quartnet:
6565

6666
```
6767
python3 tools/evaluate.py --config configs/config.yaml
6868
```
6969

70+
Or you wanna finetune wav2vec2.0 model from Vietnamese pretrained [w2v2.0](https://huggingface.co/nguyenvulebinh/wav2vec2-base-vietnamese-250h):
71+
72+
```
73+
python3 tools/fintune_w2v.py
74+
```
75+
7076
## Demo
7177

7278
This time, I provide small code with streamlit for asr demo, you can run:
@@ -76,11 +82,14 @@ stream run demo/app.py
7682

7783
# Results
7884

79-
I used wandb for logging results and antifacts during training, here are some visualizations after several epochs:
85+
I used wandb&tensorboard for logging results and antifacts during training, here are some visualizations after several epochs:
8086
![image](https://user-images.githubusercontent.com/61444616/195522590-ae3267bf-0a15-4407-ab0f-4d1aca3b20d6.png)
8187

82-
8388
# References
8489

8590
- Mainly based on [this implementation](https://github.com/oleges1/quartznet-pytorch)
8691
- The [paper](https://arxiv.org/abs/1910.10261)
92+
- Vietnamese ASR - [VietAI](https://github.com/vietai/ASR)
93+
- Lightning-Flash [repo](https://github.com/Lightning-AI/lightning-flash)
94+
- Tokenizer used from [youtokentome](https://github.com/VKCOM/YouTokenToMe)
95+
- Language model [KenLM](https://github.com/kpu/kenlm)

0 commit comments

Comments
 (0)