You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: mdp_playground/envs/rl_toy_env.py
+36-15Lines changed: 36 additions & 15 deletions
Original file line number
Diff line number
Diff line change
@@ -53,7 +53,7 @@ class RLToyEnv(gym.Env):
53
53
diameter : int > 0
54
54
For discrete environments, if diameter = d, the set of states is set to be a d-partite graph (and NOT a complete d-partite graph), where, if we order the d sets as 1, 2, .., d, states from set 1 will have actions leading to states in set 2 and so on, with the final set d having actions leading to states in set 1. Number of actions for each state will, thus, be = (number of states) / (d). Default value: 1 for discrete environments. For continuous environments, this dimension is set automatically based on the state_space_max value.
55
55
terminal_state_density : float in range [0, 1]
56
-
For discrete environments, the fraction of states that are terminal; the terminal states are fixed to the "last" states when we consider them to be ordered by their numerical value. This is w.l.o.g. because discrete states are categorical. For continuous environments, please see terminal_states and term_state_edge for how to control terminal states. Default value: 0.25.
56
+
For discrete environments, the fraction of states that are terminal; the terminal states are fixed to the "last" states when we consider them to be ordered by their numerical value. This is w.l.o.g. because discrete states are categorical. For continuous environments, please see terminal_states and term_state_edge for how to control terminal states. For grid environments, please see terminal_states only. Default value: 0.25.
57
57
term_state_reward : float
58
58
Adds this to the reward if a terminal state was reached at the current time step. Default value: 0.
59
59
image_representations : boolean
@@ -134,7 +134,7 @@ class RLToyEnv(gym.Env):
134
134
target_point : numpy.ndarray
135
135
The target point in case move_to_a_point is the reward_function. If make_denser is false, reward is only handed out when the target point is reached.
136
136
terminal_states : Python function(state) or 1-D numpy.ndarray
137
-
Same description as for terminal_states under discrete envs
137
+
Same description as for terminal_states under discrete envs, except that the state is a grid state, e.g., a list of [x, y] coordinates for a 2-D grid.
# #test: 1. for checking 0 distance for same action being always applied; 2. similar to 1. but for different dynamics orders; 3. similar to 1 but for different action_space_dims; 4. for a known applied action case, check manually the results of the formulae and see that programmatic results match: should also have a unit version of 4. for dist_of_pt_from_line() and an integration version here for total_deviation calc.?.
1816
+
# #test: 1. for checking 0 distance for same action being always applied;
1817
+
# 2. similar to 1. but for different dynamics orders;
1818
+
# 3. similar to 1 but for different action_space_dims;
1819
+
# 4. for a known applied action case, check manually the results
1820
+
# of the formulae and see that programmatic results match: should
1821
+
# also have a unit version of 4. for dist_of_pt_from_line() and
1822
+
# an integration version here for total_deviation calc.?.
# #random ###TODO Would be better to parameterise this in terms of state, action and time_step as well. Would need to change implementation to have a queue for the rewards achieved and then pick the reward that was generated delay timesteps ago.
self.logger.info("Reward: "+str(reward) +" Noise in reward: "+str(noise_in_reward))
1914
1931
reward+=noise_in_reward
1915
1932
reward*=self.reward_scale
1916
1933
reward+=self.reward_shift
@@ -2266,7 +2283,8 @@ def seed(self, seed=None):
2266
2283
2267
2284
2268
2285
defdist_of_pt_from_line(pt, ptA, ptB):
2269
-
"""Returns shortest distance of a point from a line defined by 2 points - ptA and ptB. Based on: https://softwareengineering.stackexchange.com/questions/168572/distance-from-point-to-n-dimensional-line"""
2286
+
"""Returns shortest distance of a point from a line defined by 2 points - ptA and ptB.
2287
+
Based on: https://softwareengineering.stackexchange.com/questions/168572/distance-from-point-to-n-dimensional-line"""
0 commit comments