You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
# fixed parameterisation for cont. envs. right now.
1227
1228
pass
1228
1229
1229
-
self.P=lambdas, a: self.transition_function(s, a)
1230
+
# ####IMP Keep this commented out as it causes problems with deepcopy().
1231
+
# self.P = lambda s, a: self.transition_function(s, a)
1230
1232
1231
1233
definit_reward_function(self):
1232
1234
"""Initialises reward function, R by selecting random sequences to be rewardable for discrete environments. For continuous environments, we have fixed available options for the reward function."""
@@ -1550,7 +1552,7 @@ def get_rews(rng, r_dict):
1550
1552
elifself.config["state_space_type"] =="grid":
1551
1553
... # ###TODO Make sequences compatible with grid
1552
1554
1553
-
self.R=lambdas, a: self.reward_function(s, a)
1555
+
# self.R = lambda s, a: self.reward_function(s, a)
# ### TODO Decide whether to give reward before or after transition ("after" would mean taking next state into account and seems more logical to me) - make it a dimension? - R(s) or R(s, a) or R(s, a, s')? I'd say give it after and store the old state in the augmented_state to be able to let the R have any of the above possible forms. That would also solve the problem of implicit 1-step delay with giving it before. _And_ would not give any reward for already being in a rewarding state in the 1st step but _would_ give a reward if 1 moved to a rewardable state - even if called with R(s, a) because s' is stored in the augmented_state! #####IMP
2010
2012
2011
2013
# ###TODO P uses last state while R uses augmented state; for cont. env, P does know underlying state_derivatives - we don't want this to be the case for the imaginary rollout scenario;
0 commit comments