Skip to content

Commit b10f4c6

Browse files
Update README
1 parent 819002c commit b10f4c6

File tree

1 file changed

+20
-21
lines changed

1 file changed

+20
-21
lines changed

‎README.md

Lines changed: 20 additions & 21 deletions
Original file line numberDiff line numberDiff line change
@@ -19,13 +19,13 @@
1919

2020

2121
# MDP Playground
22-
A python package to inject low-level dimensions of difficulties in RL environments. There are toy environments to design and debug RL agents. And complex environment wrappers for Atari and Mujoco to test robustness to these dimensions in complex environments.
22+
A python package to inject low-level dimensions of hardness in RL environments. There are toy environments to design and debug RL agents. And complex environment wrappers for Gym environments (inclduing Atari and Mujoco) to test robustness to these dimensions in complex environments.
2323

2424
## Getting started
2525
There are 4 parts to the package:
26-
1) **Toy Environments**: The base toy Environment in [`mdp_playground/envs/rl_toy_env.py`](mdp_playground/envs/rl_toy_env.py) implements the toy environment functionality, including discrete and continuous environments, and is parameterised by a `config` dict which contains all the information needed to instantiate the required MDP. Please see [`example.py`](example.py) for some simple examples of how to use the MDP environments in the package. For further details, please refer to the documentation in [`mdp_playground/envs/rl_toy_env.py`](mdp_playground/envs/rl_toy_env.py).
26+
1) **Toy Environments**: The base toy Environment in [`mdp_playground/envs/rl_toy_env.py`](mdp_playground/envs/rl_toy_env.py) implements the toy environment functionality, including discrete and continuous environments, and is parameterised by a `config` dict which contains all the information needed to instantiate the required toy MDP. Please see [`example.py`](example.py) for some simple examples of how to use these. For further details, please refer to the documentation in [`mdp_playground/envs/rl_toy_env.py`](mdp_playground/envs/rl_toy_env.py).
2727

28-
2) **Complex Environment Wrappers**: Similar to the toy environment, this is parameterised by a `config` dict which contains all the information needed to inject the dimensions into Atari or Mujoco environments. Please see [`example.py`](example.py) for some simple examples of how to use these. The Atari wrapper is in [`mdp_playground/envs/gym_env_wrapper.py`](mdp_playground/envs/gym_env_wrapper.py) and the Mujoco wrapper is in [`mdp_playground/envs/mujoco_env_wrapper.py`](mdp_playground/envs/mujoco_env_wrapper.py).
28+
2) **Complex Environment Wrappers**: Similar to the toy environment, this is parameterised by a `config` dict which contains all the information needed to inject the dimensions into Gym environments (tested with Atari, Mujoco and ProcGen). Please see [`example.py`](example.py) for some simple examples of how to use these. The generic Gym wrapper (for Atari, ProcGen, etc.) is in [`mdp_playground/envs/gym_env_wrapper.py`](mdp_playground/envs/gym_env_wrapper.py) and the Mujoco specific wrapper is in [`mdp_playground/envs/mujoco_env_wrapper.py`](mdp_playground/envs/mujoco_env_wrapper.py).
2929

3030
3) **Experiments**: Experiments are launched using [`run_experiments.py`](run_experiments.py). Config files for experiments are located inside the [`experiments`](experiments) directory. Please read the [instructions](#running-experiments) below for details on how to launch experiments.
3131

@@ -35,7 +35,7 @@ There are 4 parts to the package:
3535
## Running experiments from the main paper
3636
For reproducing experiments from the main paper, please continue reading.
3737

38-
For general instructions, please see [here](#installation).
38+
For general install and usage instructions, please see [here](#installation).
3939

4040
### Installation for running experiments from the main paper
4141
We recommend using `conda` environments to manage virtual `Python` environments to run the experiments. Unfortunately, you will have to maintain 2 environments - 1 for the "older" **discrete toy** experiments and 1 for the "newer" **continuous and complex** experiments from the paper. As mentioned in Appendix section **Tuned Hyperparameters** in the paper, this is because of issues with Ray, the library that we used for our baseline agents.
@@ -60,14 +60,15 @@ wget 'https://ray-wheels.s3-us-west-2.amazonaws.com/master/8d0c1b5e068853bf748f7
6060
pip install ray-0.9.0.dev0-cp36-cp36m-manylinux1_x86_64.whl[rllib,debug]
6161
```
6262

63-
We list here the commands for the experiments from the main paper:
63+
We list here how the commands for the experiments from the main paper look like:
6464
```bash
65-
# Discrete toy environments:
65+
# For example, for the discrete toy experiments:
66+
conda activate py36_toy_rl_disc_toy
67+
python run_experiments.py -c experiments/dqn_del.py -e dqn_del
68+
6669
# Image representation experiments:
6770
conda activate py36_toy_rl_disc_toy
6871
python run_experiments.py -c experiments/dqn_image_representations.py -e dqn_image_representations
69-
python run_experiments.py -c experiments/rainbow_image_representations.py -e rainbow_image_representations
70-
python run_experiments.py -c experiments/a3c_image_representations.py -e a3c_image_representations
7172
python run_experiments.py -c experiments/dqn_image_representations_sh_quant.py -e dqn_image_representations_sh_quant
7273

7374
# Continuous toy environments:
@@ -84,19 +85,17 @@ conda activate py36_toy_rl_cont_comp
8485
python run_experiments.py -c experiments/dqn_qbert_del.py -e dqn_qbert_del
8586
python run_experiments.py -c experiments/ddpg_halfcheetah_time_unit.py -e ddpg_halfcheetah_time_unit
8687

87-
# For the spider plots, experiments for all the agents and dimensions will need to be run from the experiments directory, i.e., for discrete: dqn_p_r_noises.py, a3c_p_r_noises, ..., dqn_seq_del, ..., dqn_sparsity, ..., dqn_image_representations, ...
88-
# for continuous:, ddpg_move_to_a_point_p_noise, td3_move_to_a_point_p_noise, ..., ddpg_move_to_a_point_r_noise, ..., ddpg_move_to_a_point_irr_dims, ..., ddpg_move_to_a_point_action_loss_weight, ..., ddpg_move_to_a_point_action_max, ..., ddpg_move_to_a_point_target_radius, ..., ddpg_move_to_a_point_time_unit
89-
# and then follow the instructions in plot_experiments.ipynb
90-
91-
# For the bsuite debugging experiment, please run the bsuite sonnet dqn agent on our toy environment while varying reward density. Commit https://github.com/deepmind/bsuite/commit/5116216b62ce0005100a6036fb5397e358652530 should work fine.
88+
# For the bsuite debugging experiment, please run the bsuite sonnet dqn agent on our toy environment while varying reward density. Commit https://github.com/deepmind/bsuite/commit/5116216b62ce0005100a6036fb5397e358652530 from the bsuite repo should work fine.
9289
```
9390

94-
The CSV stats files will be saved to the current directory and can be analysed in [`plot_experiments.ipynb`](plot_experiments.ipynb).
91+
For plotting, please follow the instructions [here](#plotting).
9592

9693

9794
## Installation
9895
For reproducing experiments from the main paper, please see [here](#running-experiments-from-the-main-paper).
9996

97+
For continued usage of MDP Playground as it is in development, please continue reading.
98+
10099
### Production use
101100
We recommend using `conda` to manage environments. After setup of the environment, you can install MDP Playground in two ways:
102101
#### Manual
@@ -107,7 +106,7 @@ pip install -e .[extras]
107106
This might be the preferred way if you want easy access to the included experiments.
108107

109108
#### From PyPI
110-
MDP Playground is also on PyPI. Just run:
109+
Alternatively, MDP Playground can also be installed from PyPI. Just run:
111110
```bash
112111
pip install mdp_playground[extras]
113112
```
@@ -119,21 +118,21 @@ You can run experiments using:
119118
run-mdpp-experiments -c <config_file> -e <exp_name> -n <config_num>
120119
```
121120
The `exp_name` is a prefix for the filenames of CSV files where stats for the experiments are recorded. The CSV stats files will be saved to the current directory.<br>
122-
Each of the command line arguments has defaults. Please refer to the documentation inside [`run_experiments.py`](run_experiments.py) for further details on the command line arguments. (Or run it with the `-h` flag to bring up help.)
121+
The command line arguments also usually have defaults. Please refer to the documentation inside [`run_experiments.py`](run_experiments.py) for further details on the command line arguments. (Or run it with the `-h` flag to bring up help.)
123122

124-
The config files for experiments from the [paper](https://arxiv.org/abs/1909.07750) are in the experiments directory.<br>
125-
The name of the file corresponding to an experiment is formed as: `<algorithm_name>_<dimension_names>.py`<br>
126-
Some sample `algorithm_name`s are: `dqn`, `rainbow`, `a3c`, `a3c_lstm`, `ddpg`, `td3` and `sac`<br>
123+
The config files for experiments from the [paper](https://arxiv.org/abs/1909.07750) are in the `experiments` directory.<br>
124+
The name of the file corresponding to an experiment is formed as: `<algorithm_name>_<dimension_names>.py` for the toy environments<br>
125+
And as: `<algorithm_name>_<env>_<dimension_names>.py` for the complex environments<br>
126+
Some sample `algorithm_name`s are: `dqn`, `rainbow`, `a3c`, `ddpg`, `td3` and `sac`<br>
127127
Some sample `dimension_name`s are: `seq_del` (for **delay** and **sequence length** varied together), `p_r_noises` (for **P** and **R noises** varied together),
128128
`target_radius` (for varying **target radius**) and `time_unit` (for varying **time unit**)<br>
129129
For example, for algorithm **DQN** when varying dimensions **delay** and **sequence length**, the corresponding experiment file is [`dqn_seq_del.py`](experiments/dqn_seq_del.py)
130130

131131
The CSV stats files will be saved to the current directory and can be analysed in [`plot_experiments.ipynb`](plot_experiments.ipynb).
132132

133133
## Plotting
134-
To plot results from experiments, run `jupyter-notebook` and open [`plot_experiments.ipynb`](plot_experiments.ipynb) in Jupyter. There are instructions within each of the cells on how to generate and save plots.
134+
To plot results from experiments, please be sure that you installed MDP Playground for production use manually (please see [here](#manual)) and then run `jupyter-notebook` and open [`plot_experiments.ipynb`](plot_experiments.ipynb) in Jupyter. There are instructions within each of the cells on how to generate and save plots.
135135

136-
We have provided a sample set of CSVs you could use in the supplementary material. There correspond to experiments from the main paper used for the spider plots for continuous environments (Figure 3b).
137136

138137
## Documentation
139138
The documentation can be found at: https://automl.github.io/mdp-playground/

0 commit comments

Comments
 (0)