-
Notifications
You must be signed in to change notification settings - Fork 604
Description
When running notebook gpt_oss_(20B)_Reinforcement_Learning_2048_Game_DGX_Spark.ipynb off of git commit b6b369 on DGX Spark I observe the notebook will hang indefinitely and randomly during training.
Version Info
NVIDIA DGX Spark Version 7.2.3 (GNU/Linux 6.11.0-1016-nvidia aarch64)
To Reproduce
- Build the DGX spark docker container as described in this tutorial
- Run the docker container with this command
docker run -it \
--gpus=all \
--net=host \
--ipc=host \
--ulimit memlock=-1 \
--ulimit stack=67108864 \
-v $(pwd):$(pwd) \
-v $HOME/.cache/huggingface:/root/.cache/huggingface \
-w $(pwd) \
unsloth-dgx-spark
Expected Behavior
Notebook runs 1000 steps.
Observed Behavior
Notebook runs for a random number of steps and then hangs. When "hanging" the GPU shows zero activity as reported through nvidia-smi. I've observed it hanging on step 3, 9, 20, 23,and 50. I've run it multiple times and have never achieved a full run.
Debugging Info
I saved the notebook as a python file and ran it. I still observed the hanging behavior. When it hung I ctrl-c'd the process. Here is the stack trace.
for move in dirs:
new_board, moved = simulate(board, move)
if not moved:
continue
score = sum(sum(row) for row in new_board)
if score > best_score:
best_score, best_move = score, move
return best_move if best_move else "W"
┌───┬───┬───┬───┬───┬───┐
│ 4│ 2│ 16│ 4│ 2│ 4│
├───┼───┼───┼───┼───┼───┤
│ 2│ 4│ 32│128│ 8│ 32│
├───┼───┼───┼───┼───┼───┤
│ 8│128│ 4│512│ 64│ 16│
├───┼───┼───┼───┼───┼───┤
│ 16│ 2│256│ 8│ 32│ 8│
├───┼───┼───┼───┼───┼───┤
│ 8│ 32│ 16│ 4│ 16│ 4│
├───┼───┼───┼───┼───┼───┤
│ 2│ 16│ 4│ 2│ 8│ 2│
└───┴───┴───┴───┴───┴───┘
{'loss': 0.0, 'grad_norm': 3.6165287494659424, 'learning_rate': 4.000000000000001e-06, 'num_tokens': 10835.0, 'completio
ns/mean_length': 535.5, 'completions/min_length': 485.0, 'completions/max_length': 586.0, 'completions/clipped_ratio': 0
.5, 'completions/mean_terminated_length': 485.0, 'completions/min_terminated_length': 485.0, 'completions/max_terminated
_length': 485.0, 'rewards/function_works/mean': -0.5, 'rewards/function_works/std': 2.1213202476501465, 'rewards/no_chea
ting/mean': 0.0, 'rewards/no_cheating/std': 1.4142135381698608, 'rewards/strategy_succeeds/mean': 1.0, 'rewards/strategy
_succeeds/std': 1.4142135381698608, 'reward': 0.5, 'reward_std': 4.949747562408447, 'frac_reward_zero_std': 0.0, 'comple
tion_length': 586.0, 'kl': 0.0032773646526038647, 'epoch': 0.01}
1%|▋ | 9/1000 [31:21<59:11:33, 215.03s/it]
^C
^C^CTraceback (most recent call last):
File "/home/cpadwick/code/unsloth/notebooks/nb/gpt_oss_(20B)_Reinforcement_Learning_2048_Game_DGX_Spark.py", line 717,
in <module>
trainer.train()
File "/home/cpadwick/code/unsloth/notebooks/nb/unsloth_compiled_cache/UnslothGRPOTrainer.py", line 53, in wrapper
output = f(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/transformers/trainer.py", line 2328, in train
return inner_training_loop(
^^^^^^^^^^^^^^^^^^^^
File "<string>", line 323, in _fast_inner_training_loop
File "<string>", line 34, in _unsloth_training_step
File "/usr/local/lib/python3.12/dist-packages/trl/extras/profiling.py", line 98, in wrapper
return func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cpadwick/code/unsloth/notebooks/nb/unsloth_compiled_cache/UnslothGRPOTrainer.py", line 2015, in _prepare_inputs
generation_batch = self._generate_and_score_completions(generation_batch)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cpadwick/code/unsloth/notebooks/nb/unsloth_compiled_cache/UnslothGRPOTrainer.py", line 2427, in _generate_and_score_completions
rewards_per_func = self._calculate_rewards(inputs, original_prompts, completions, completion_ids_list)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/trl/extras/profiling.py", line 98, in wrapper
return func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cpadwick/code/unsloth/notebooks/nb/unsloth_compiled_cache/UnslothGRPOTrainer.py", line 2063, in _calculate_rewards
output_reward_func = reward_func(
^^^^^^^^^^^^
File "/home/cpadwick/code/unsloth/notebooks/nb/gpt_oss_(20B)_Reinforcement_Learning_2048_Game_DGX_Spark.py", line 613, in strategy_succeeds
steps, game_state = execute_strategy(new_strategy, game)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/unsloth_zoo/rl_environments.py", line 363, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/home/cpadwick/code/unsloth/notebooks/nb/gpt_oss_(20B)_Reinforcement_Learning_2048_Game_DGX_Spark.py", line 402, in execute_strategy
return _execute_strategy(strategy, game)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/cpadwick/code/unsloth/notebooks/nb/gpt_oss_(20B)_Reinforcement_Learning_2048_Game_DGX_Spark.py", line 374, in _execute_strategy
game.do_action(action)
File "/home/cpadwick/code/unsloth/notebooks/nb/gpt_oss_(20B)_Reinforcement_Learning_2048_Game_DGX_Spark.py", line 233, in do_action
self._update_state_after_change()
File "/home/cpadwick/code/unsloth/notebooks/nb/gpt_oss_(20B)_Reinforcement_Learning_2048_Game_DGX_Spark.py", line 245, in _update_state_after_change
if not _can_move(self._board):
File "/home/cpadwick/code/unsloth/notebooks/nb/gpt_oss_(20B)_Reinforcement_Learning_2048_Game_DGX_Spark.py", line 155, in _can_move
if _empty_cells(board):
^^^^^^^^^^^^^^^^^^^
File "/home/cpadwick/code/unsloth/notebooks/nb/gpt_oss_(20B)_Reinforcement_Learning_2048_Game_DGX_Spark.py", line 152, in _empty_cells
return [(r, c) for r in range(size) for c in range(size) if board[r][c] == 0]
^^^^^^^^^^^
KeyboardInterrupt
^C^CException ignored in atexit callback: <function shutdown_compile_workers at 0xe65599ecfd80>
Traceback (most recent call last):
File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/async_compile.py", line 145, in shutdown_compile_workers
pool.shutdown()
File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_worker/subproc_pool.py", line 264, in shutdown
self.process.wait(300)
File "/usr/lib/python3.12/subprocess.py", line 1277, in wait
self._wait(timeout=sigint_timeout)
File "/usr/lib/python3.12/subprocess.py", line 2047, in _wait
time.sleep(delay)
KeyboardInterrupt:
It always seems to get stuck in the _update_state_after_change() method. I'm not understanding why/how the execute_with_time_limit() decorator is being defeated since it wraps the execute_strategy method.