Skip to content

Merge Experimental #11

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 7, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
87 changes: 48 additions & 39 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,69 +27,39 @@ There are 4 parts to the package:

2) **Complex Environment Wrappers**: Similar to the toy environment, this is parameterised by a `config` dict which contains all the information needed to inject the dimensions into Atari or Mujoco environments. Please see [`example.py`](example.py) for some simple examples of how to use these. The Atari wrapper is in [`mdp_playground/envs/gym_env_wrapper.py`](mdp_playground/envs/gym_env_wrapper.py) and the Mujoco wrapper is in [`mdp_playground/envs/mujoco_env_wrapper.py`](mdp_playground/envs/mujoco_env_wrapper.py).

3) **Experiments**: Experiments are launched using [`run_experiments.py`](run_experiments.py). Config files for experiments are located inside the [`experiments`](experiments) directory. Please read the [instructions](#running-experiments) below for details.
3) **Experiments**: Experiments are launched using [`run_experiments.py`](run_experiments.py). Config files for experiments are located inside the [`experiments`](experiments) directory. Please read the [instructions](#running-experiments) below for details on how to launch experiments.

4) **Analysis**: [`plot_experiments.ipynb`](plot_experiments.ipynb) contains code to plot the standard plots from the paper.

## Installation

### Production use
We recommend using `conda` to manage environments. After setup of the environment, you can install MDP Playground in two ways:
#### Manual
To install MDP Playground manually, clone the repository and run:
```bash
pip install -e .[extras]
```
This might be the preferred way if you want easy access to the included experiments.

#### From PyPI
MDP Playground is also on PyPI. Just run:
```bash
pip install mdp_playground[extras]
```
## Running experiments from the main paper
For reproducing experiments from the main paper, please continue reading.

For general instructions, please see [here](#installation).

### Reproducing results from the paper
We recommend using `conda` environments to manage virtual `Python` environments to run the experiments. Unfortunately, you will have to maintain 2 environments - 1 for the "older" **discrete toy** experiments and 1 for the "newer" **continuous and complex** experiments from the paper. As mentioned in Appendix P in the paper, this is because of issues with Ray, the library that we used for our baseline agents.
### Installation for running experiments from the main paper
We recommend using `conda` environments to manage virtual `Python` environments to run the experiments. Unfortunately, you will have to maintain 2 environments - 1 for the "older" **discrete toy** experiments and 1 for the "newer" **continuous and complex** experiments from the paper. As mentioned in Appendix section **Tuned Hyperparameters** in the paper, this is because of issues with Ray, the library that we used for our baseline agents.

Please follow the following commands to install for the discrete toy experiments:
```bash
conda create -n py36_toy_rl_disc_toy python=3.6
conda activate py36_toy_rl_disc_toy
cd mdp-playground
pip install -r requirements.txt
pip install -e .[extras_disc]
```

Please follow the following commands to install for the continuous and complex experiments:
Please follow the following commands to install for the continuous and complex experiments. **IMPORTANT**: In case, you do not have MuJoCo, please ignore any mujoco-py related installation errors below:
```bash
conda create -n py36_toy_rl_cont_comp python=3.6
conda activate py36_toy_rl_cont_comp
cd mdp-playground
pip install -r requirements.txt
pip install -e .[extras_cont]
wget 'https://ray-wheels.s3-us-west-2.amazonaws.com/master/8d0c1b5e068853bf748f72b1e60ec99d240932c6/ray-0.9.0.dev0-cp36-cp36m-manylinux1_x86_64.whl'
pip install ray-0.9.0.dev0-cp36-cp36m-manylinux1_x86_64.whl[rllib,debug]
```

## Running experiments
For reproducing experiments from the main paper, please see [below](#running-experiments-from-the-main-paper).

For general instructions, please continue reading.

You can run experiments using:
```
run-mdpp-experiments -c <config_file> -e <exp_name> -n <config_num>
```
The `exp_name` is a prefix for the filenames of CSV files where stats for the experiments are recorded. The CSV stats files will be saved to the current directory.<br>
Each of the command line arguments has defaults. Please refer to the documentation inside [`run_experiments.py`](run_experiments.py) for further details on the command line arguments. (Or run it with the `-h` flag to bring up help.)

The config files for experiments from the [paper](https://arxiv.org/abs/1909.07750) are in the experiments directory.<br>
The name of the file corresponding to an experiment is formed as: `<algorithm_name>_<dimension_names>.py`<br>
Some sample `algorithm_name`s are: `dqn`, `rainbow`, `a3c`, `a3c_lstm`, `ddpg`, `td3` and `sac`<br>
Some sample `dimension_name`s are: `seq_del` (for **delay** and **sequence length** varied together), `p_r_noises` (for **P** and **R noises** varied together),
`target_radius` (for varying **target radius**) and `time_unit` (for varying **time unit**)<br>
For example, for algorithm **DQN** when varying dimensions **delay** and **sequence length**, the corresponding experiment file is [`dqn_seq_del.py`](experiments/dqn_seq_del.py)

## Running experiments from the main paper
We list here the commands for the experiments from the main paper:
```bash
# Discrete toy environments:
Expand All @@ -108,6 +78,8 @@ python run_experiments.py -c experiments/ddpg_move_to_a_point_irr_dims.py -e ddp
python run_experiments.py -c experiments/ddpg_move_to_a_point_p_order_2.py -e ddpg_move_to_a_point_p_order_2

# Complex environments:
# The commands below run all configs serially.
# In case, you want to parallelise on a cluster, please provide the CLI argument -n <config_number> at the end of the given commands. Please refer to the documentation for run_experiments.py for this.
conda activate py36_toy_rl_cont_comp
python run_experiments.py -c experiments/dqn_qbert_del.py -e dqn_qbert_del
python run_experiments.py -c experiments/ddpg_halfcheetah_time_unit.py -e ddpg_halfcheetah_time_unit
Expand All @@ -121,6 +93,43 @@ python run_experiments.py -c experiments/ddpg_halfcheetah_time_unit.py -e ddpg_h

The CSV stats files will be saved to the current directory and can be analysed in [`plot_experiments.ipynb`](plot_experiments.ipynb).


## Installation
For reproducing experiments from the main paper, please see [here](#running-experiments-from-the-main-paper).

### Production use
We recommend using `conda` to manage environments. After setup of the environment, you can install MDP Playground in two ways:
#### Manual
To install MDP Playground manually, clone the repository and run:
```bash
pip install -e .[extras]
```
This might be the preferred way if you want easy access to the included experiments.

#### From PyPI
MDP Playground is also on PyPI. Just run:
```bash
pip install mdp_playground[extras]
```


## Running experiments
You can run experiments using:
```
run-mdpp-experiments -c <config_file> -e <exp_name> -n <config_num>
```
The `exp_name` is a prefix for the filenames of CSV files where stats for the experiments are recorded. The CSV stats files will be saved to the current directory.<br>
Each of the command line arguments has defaults. Please refer to the documentation inside [`run_experiments.py`](run_experiments.py) for further details on the command line arguments. (Or run it with the `-h` flag to bring up help.)

The config files for experiments from the [paper](https://arxiv.org/abs/1909.07750) are in the experiments directory.<br>
The name of the file corresponding to an experiment is formed as: `<algorithm_name>_<dimension_names>.py`<br>
Some sample `algorithm_name`s are: `dqn`, `rainbow`, `a3c`, `a3c_lstm`, `ddpg`, `td3` and `sac`<br>
Some sample `dimension_name`s are: `seq_del` (for **delay** and **sequence length** varied together), `p_r_noises` (for **P** and **R noises** varied together),
`target_radius` (for varying **target radius**) and `time_unit` (for varying **time unit**)<br>
For example, for algorithm **DQN** when varying dimensions **delay** and **sequence length**, the corresponding experiment file is [`dqn_seq_del.py`](experiments/dqn_seq_del.py)

The CSV stats files will be saved to the current directory and can be analysed in [`plot_experiments.ipynb`](plot_experiments.ipynb).

## Plotting
To plot results from experiments, run `jupyter-notebook` and open [`plot_experiments.ipynb`](plot_experiments.ipynb) in Jupyter. There are instructions within each of the cells on how to generate and save plots.

Expand Down
1 change: 1 addition & 0 deletions experiments/ddpg_halfcheetah_action_max.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
from ray import tune
from collections import OrderedDict
num_seeds = 5
timesteps_total = 3000000


var_env_configs = OrderedDict(
Expand Down
1 change: 1 addition & 0 deletions experiments/ddpg_halfcheetah_time_unit.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
from ray import tune
from collections import OrderedDict
num_seeds = 5
timesteps_total = 3000000


var_env_configs = OrderedDict(
Expand Down
1 change: 1 addition & 0 deletions experiments/sac_halfcheetah_action_max.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
from ray import tune
from collections import OrderedDict
num_seeds = 5
timesteps_total = 3000000


var_env_configs = OrderedDict(
Expand Down
1 change: 1 addition & 0 deletions experiments/sac_halfcheetah_time_unit.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
from ray import tune
from collections import OrderedDict
num_seeds = 5
timesteps_total = 3000000


var_env_configs = OrderedDict(
Expand Down
1 change: 1 addition & 0 deletions experiments/sac_halfcheetah_time_unit_config_processor.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
from collections import OrderedDict
from mdp_playground.config_processor import *
num_seeds = 5
timesteps_total = 3000000


var_env_configs = OrderedDict(
Expand Down
1 change: 1 addition & 0 deletions experiments/td3_halfcheetah_action_max.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
from ray import tune
from collections import OrderedDict
num_seeds = 5
timesteps_total = 3000000


var_env_configs = OrderedDict(
Expand Down
1 change: 1 addition & 0 deletions experiments/td3_halfcheetah_time_unit.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
from ray import tune
from collections import OrderedDict
num_seeds = 5
timesteps_total = 3000000


var_env_configs = OrderedDict(
Expand Down
123 changes: 69 additions & 54 deletions mdp_playground/config_processor/config_processor.py
Original file line number Diff line number Diff line change
Expand Up @@ -86,8 +86,57 @@ def process_configs(
*variable_configs, overwrite=False
)

varying_configs = []
separate_var_configs = []
# ###IMP Currently num_configs has to be equal for all 3 cases below:
# grid (i.e. var), random and sobol #TODO Not sure how to solve this #config
# setup problem. Could take Cartesian product of all 3 but that may lead to
# too many configs and Cartesian product of dicts is a pain.
if "var_configs" in dir(config):
separate_var_configs.append(
get_list_of_varying_configs(config.var_configs, mode="grid")
)
if "sobol_configs" in dir(config):
separate_var_configs.append(
get_list_of_varying_configs(
config.sobol_configs, mode="sobol", num_configs=config.num_configs
)
)
if "random_configs" in dir(config):
separate_var_configs.append(
get_list_of_varying_configs(
config.random_configs, mode="random", num_configs=config.num_configs
)
)
# print("VARYING_CONFIGS:", varying_configs)

num_configs_ = max(
[len(separate_var_configs[i]) for i in range(len(separate_var_configs))]
)
for i in range(num_configs_):
to_combine = [
separate_var_configs[j][i] for j in range(len(separate_var_configs))
]
# overwrite = False because the keys in different modes of
# config generation need to be disjoint
varying_configs.append(deepmerge_multiple_dicts(*to_combine, overwrite=False))

# #hack ####TODO Remove extra pre-processing done here and again below:
pre_final_configs = combined_processing(
config.env_config,
config.agent_config,
config.model_config,
config.eval_config,
varying_configs=copy.deepcopy(varying_configs),
framework=framework,
algorithm=config.algorithm,
)


if "timesteps_total" in dir(config):
hacky_timesteps_total = config.timesteps_total # hack
else:
hacky_timesteps_total = pre_final_configs[-1]["timesteps_total"]

config_algorithm = config.algorithm # hack
# sys.exit(0)
Expand Down Expand Up @@ -137,40 +186,6 @@ def process_configs(
+ ". Available options are: ray and stable_baselines."
)

varying_configs = []
separate_var_configs = []
# ###IMP Currently num_configs has to be equal for all 3 cases below:
# grid (i.e. var), random and sobol #TODO Not sure how to solve this #config
# setup problem. Could take Cartesian product of all 3 but that may lead to
# too many configs and Cartesian product of dicts is a pain.
if "var_configs" in dir(config):
separate_var_configs.append(
get_list_of_varying_configs(config.var_configs, mode="grid")
)
if "sobol_configs" in dir(config):
separate_var_configs.append(
get_list_of_varying_configs(
config.sobol_configs, mode="sobol", num_configs=config.num_configs
)
)
if "random_configs" in dir(config):
separate_var_configs.append(
get_list_of_varying_configs(
config.random_configs, mode="random", num_configs=config.num_configs
)
)
# print("VARYING_CONFIGS:", varying_configs)

num_configs_ = max(
[len(separate_var_configs[i]) for i in range(len(separate_var_configs))]
)
for i in range(num_configs_):
to_combine = [
separate_var_configs[j][i] for j in range(len(separate_var_configs))
]
# overwrite = False because the keys in different modes of
# config generation need to be disjoint
varying_configs.append(deepmerge_multiple_dicts(*to_combine, overwrite=False))

# varying_configs is a list of dict of dicts with a specific structure.
final_configs = combined_processing(
Expand Down Expand Up @@ -876,28 +891,28 @@ def combined_processing(*static_configs, varying_configs, framework="ray", algor
"fcnet_activation": "relu",
}

# TODO Find a better way to enforce these?? Especially problematic for TD3
# because then more values for target_noise_clip are witten to CSVs than
# actually used during HPO but for normal (non-HPO) runs this needs to be
# not done.
if (algorithm == "DDPG"):
if key == "critic_lr":
final_configs[i]["actor_lr"] = value
if key == "critic_hiddens":
final_configs[i]["actor_hiddens"] = value
if algorithm == "TD3":
if key == "target_noise_clip_relative":
final_configs[i]["target_noise_clip"] = (
final_configs[i]["target_noise_clip_relative"]
* final_configs[i]["target_noise"]
)
del final_configs[i][
"target_noise_clip_relative"
] # hack have to delete it otherwise Ray will crash for unknown config param.
# TODO Find a better way to enforce these?? Especially problematic for TD3
# because then more values for target_noise_clip are witten to CSVs than
# actually used during HPO but for normal (non-HPO) runs this needs to be
# not done.
if (algorithm == "DDPG"):
if key == "critic_lr":
final_configs[i]["actor_lr"] = value
if key == "critic_hiddens":
final_configs[i]["actor_hiddens"] = value
if algorithm == "TD3":
if key == "target_noise_clip_relative":
final_configs[i]["target_noise_clip"] = (
final_configs[i]["target_noise_clip_relative"]
* final_configs[i]["target_noise"]
)
del final_configs[i][
"target_noise_clip_relative"
] # hack have to delete it otherwise Ray will crash for unknown config param.

elif key == "model":
if key == "model":
for key_2 in final_configs[i][key]:
if key_2 == "use_lstm":
if key_2 == "use_lstm" and final_configs[i][key][key_2]:
final_configs[i][key]["max_seq_len"] = (
final_configs[i]["env_config"]["delay"]
+ final_configs[i]["env_config"]["sequence_length"]
Expand Down
Loading