Name	Name	Last commit message	Last commit date
Latest commit History 232 Commits
configs	configs
docs	docs
joeynmt	joeynmt
scripts	scripts
test/data	test/data
.gitignore	.gitignore
.readthedocs.yml	.readthedocs.yml
LICENSE	LICENSE
README.md	README.md
joey-small.png	joey-small.png
requirements.txt	requirements.txt
setup.py	setup.py

Joey NMT

This is the JoeyNMT branch that was used for the experiments in the ACL 2019 paper 'Self-Regulated Interactive Sequence-to-Sequence Learning' by Julia Kreutzer and Stefan Riezler. Descriptions on how to replicate the experiments will be added soon.

Goal and Purpose

Joey NMT framework is developed for educational purposes. It aims to be a clean and minimalistic code base to help novices pursuing the understanding of the following questions.

How to implement classic NMT architectures (RNN and Transformer) in PyTorch?
What are the building blocks of these architectures and how do they interact?
How to modify these blocks (e.g. deeper, wider, ...)?
How to modify the training procedure (e.g. add a regularizer)?

In contrast to other NMT frameworks, we will not aim for state-of-the-art results or speed through engineering or training tricks since this often goes in hand with an increase in code complexity and a decrease in readability.

However, Joey NMT re-implements baselines from major publications.

Contributors

Joey NMT is developed by Joost Bastings (University of Amsterdam) and Julia Kreutzer (Heidelberg University).

Features

We aim to implement the following features (aka the minimalist toolkit of NMT):

Recurrent Encoder-Decoder with GRUs or LSTMs
Transformer Encoder-Decoder
Attention Types: MLP, Dot, Multi-Head, Bilinear
Word-, BPE- and character-based input handling
BLEU, ChrF evaluation
Beam search with length penalty and greedy decoding
Customizable initialization
Attention visualization
Learning curve plotting

[Work in progress: Transformer, Multi-Head and Dot still missing.]

Coding

In order to keep the code clean and readable, we make use of:

Style checks: Pylint, PEP8
Typing
Docstrings

[Work in progress!]

Teaching

We will create dedicated material for teaching with Joey NMT. This will include:

An overview and explanation of the code architecture.
A tutorial how to train and test a baseline model.
A walk-through example of how to implement a modification of a baseline model.

[Work in progress!]

Installation

Joey NMT is built on PyTorch v.0.4.1 and torchtext for Python >= 3.6.

Clone this repository: git clone https://github.com/joeynmt/joeynmt.git
Install the requirements: cd joeynmt pip3 install -r requirements.txt (you might want to add --user for a local installation).

Usage

Models are specified in configuration files, in simple YAML format. You can find examples in the configs directory.

Documentation

Read the docs.

Training

For training, run

python3 -m joeynmt train configs/default.yaml.

This will train a model on the training data specified in the config (here: default.yaml), validate on validation data, and store model parameters, vocabularies, validation outputs and a small number of attention plots in the model_dir (also specified in config).

The validations.txt file in the model directory reports the validation results at every validation point. Models are saved whenever a new best validation score is reached, in batch_no.ckpt, where batch_no is the number of batches the model has been trained on so far.

Run python3 scripts/plot_validation.py model_dir --plot_values bleu PPL --output_path my_plot.pdf to plot curves of validation BLEU and PPL.

For training on a GPU, set use_cuda in the config file to True.

Note that pre-processing like tokenization or BPE-ing is not included in training, but has to be done manually before.

Tip: Be careful not to overwrite models, set overwrite: False in the model configuration.

Testing

For testing, run

python3 -m joeynmt test configs/default.yaml --output_path out.

This will generate translations for validation and test set in out.[dev|test] (optional) with the latest model in the model_dir (or a specific checkpoint set with load_model). It will also evaluate the outputs with eval_metric.

Note that post-processing like detokenization or de-BPE-ing is not included in this step, but has to be done manually.

Benchmarks

Benchmarks on small models trained on GPU/CPU on standard data sets will be posted here.

IWSLT15 En-Vi, word-based
IWSLT14 De-En, 32000 joint BPE, word-based
WMT17 En-De and Lv-En, 32000 joint BPE

IWSLT English-Vietnamese

We compare against Tensorflow NMT on the IWSLT15 En-Vi data set as preprocessed by Stanford. You can download the data with scripts/get_iwslt15_envi.sh, and then use configs/iwslt_envi_luong.yaml to replicate the experiment.

Systems	tst2012 (dev)	test2013 (test)
TF NMT (greedy)	23.2	25.5
TF NMT (beam=10)	23.8	26.1
Joey NMT (greedy)	23.2	25.8
Joey NMT (beam=10, alpha=1.0)	23.8	26.5
(Luong & Manning, 2015)	-	23.3

We also compare against xnmt which uses different hyperparameters, so we use a different configuration for Joey NMT too: configs/iwslt_envi_xnmt.yaml.

Systems	tst2012 (dev)	test2013 (test)
xnmt (beam=5)	25.0	27.3
Joey NMT (greedy)	24.6	27.4
Joey NMT (beam=5, alpha=1.0)	24.9	27.7

IWSLT German-English

We compare against the baseline scores reported in (Wiseman & Rush, 2016) (W&R), (Bahdanau et al., 2017) (B17) with tokenized, lowercased BLEU (using sacrebleu). Ẁe compare a word-based model of the same size and vocabulary as in W&R and B17. The script to obtain and pre-process the data is the one published with W&R. On a K40-GPU word-level training took <1h, beam search decoding for both dev and test <2min.

Systems	level	dev	test	#params
W&R (greedy)	word	-	22.53
W&R (beam=10)	word	-	23.87
B17 (greedy)	word	-	25.82
B17 (beam=10)	word	-	27.56
Joey NMT (greedy)	word	28.41	26.68	22.05M
Joey NMT (beam=10, alpha=1.0)	word	28.96	27.03	22.05M

On CPU (use_cuda: False): (approx 8-10x slower: 8h for training, beam search decoding for both dev and test 19min, greedy decoding 5min)

Systems	level	dev	test	#params	Joey NMT config
Joey NMT (greedy)	word	28.35	26.46	22.05M
Joey NMT (beam=10, alpha=1.0)	word	28.85	27.06	22.05M

In addition, we compare to a BPE-based GRU model with 32k (Groundhog style). Use scripts/get_iwslt14_bpe.sh to pre-process the data and configs/iwslt14_deen_bpe.yaml to train the model.

Systems	level	dev	test	#params	Joey NMT config
Joey NMT (greedy)	bpe	27.8		60.68M
Joey NMT (beam=5, alpha=1.0)	bpe	28.74	27.63	60.68M

WMT 17 English-German and Latvian-English

We compare against the results for recurrent BPE-based models that were reported in the Sockeye paper. We only consider the Groundhog setting here, where toolkits are used out-of-the-box for creating a Groundhog-like model (1 layer, LSTMs, MLP attention). The data is pre-processed as described in the paper (code). Postprocessing is done with Moses' detokenizer, evaluation with sacrebleu.

Note that the scores reported for other models might not reflect the current state of the code, but the state at the time of the Sockeye evaluation. Please also consider the difference in number of parameters despite "the same" setup: our models are the smallest in numbers of parameters.

English-German

Groundhog setting: encoder rnn=500, lr=0.0003, bridge=True

Systems	level	dev	test	#params	Joey NMT config
Sockeye (beam=5)	bpe	-	23.18	87.83M
OpenNMT-Py (beam=5)	bpe	-	18.66	87.62M
Joey NMT (beam=5)	bpe	24.33	23.45	86.37M	`configs/wmt_ende_default.yaml`

The Joey NMT model was trained for 4 days (14 epochs).

Latvian-English

Groundhog setting: encoder rnn=500, lr=0.0003, bridge=True

Systems	level	dev	test	#params	Joey NMT config
Sockeye (beam=5)	bpe	-	14.40	?
OpenNMT-Py (beam=5)	bpe	-	9.98	?
Joey NMT (beam=5)	bpe	12.09	8.75	64.52M	`configs/wmt_lven_default.yaml`

Contributing

Since this codebase is supposed to stay clean and minimalistic, contributions addressing the following are welcome:

Code correctness
Code cleanliness
Documentation quality
Speed or memory improvements

Code extending the functionalities beyond the basics will most likely not end up in the master branch, but we're curions to learn what you used Joey for.

Use-cases and Projects

Here we'll collect projects and repositories that are based on Joey. If you used Joey for a project, publication or built some code on top of it, let us know and we'll link it here.

Projects:

Contact

Please leave an issue if you have questions or issues with the code.

For general questions, email us at joeynmt <at> gmail.com.

Naming

Joeys are infant marsupials.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Joey NMT

Goal and Purpose

Contributors

Features

Coding

Teaching

Installation

Usage

Documentation

Training

Testing

Benchmarks

IWSLT English-Vietnamese

IWSLT German-English

WMT 17 English-German and Latvian-English

English-German

Latvian-English

Contributing

Use-cases and Projects

Contact

Naming

About

Uh oh!

Releases

Packages

Languages

License

juliakreutzer/bandit-joeynmt

Folders and files

Latest commit

History

Repository files navigation

Joey NMT

Goal and Purpose

Contributors

Features

Coding

Teaching

Installation

Usage

Documentation

Training

Testing

Benchmarks

IWSLT English-Vietnamese

IWSLT German-English

WMT 17 English-German and Latvian-English

English-German

Latvian-English

Contributing

Use-cases and Projects

Contact

Naming

About

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages