notebook gpt_oss_(20B)_Reinforcement_Learning_2048_Game_DGX_Spark.ipynb hangs on DGX spark

When running notebook gpt_oss_(20B)_Reinforcement_Learning_2048_Game_DGX_Spark.ipynb off of git commit b6b369 on DGX Spark I observe the notebook will hang indefinitely and randomly during training.

Version Info

NVIDIA DGX Spark Version 7.2.3 (GNU/Linux 6.11.0-1016-nvidia aarch64)

To Reproduce

Build the DGX spark docker container as described in this tutorial
Run the docker container with this command

docker run -it \
    --gpus=all \
    --net=host \
    --ipc=host \
    --ulimit memlock=-1 \
    --ulimit stack=67108864 \
    -v $(pwd):$(pwd) \
    -v $HOME/.cache/huggingface:/root/.cache/huggingface \
    -w $(pwd) \
    unsloth-dgx-spark

Download and run the notebook as described in the tutorial

Expected Behavior

Notebook runs 1000 steps.

Observed Behavior

Notebook runs for a random number of steps and then hangs. When "hanging" the GPU shows zero activity as reported through nvidia-smi. I've observed it hanging on step 3, 9, 20, 23,and 50. I've run it multiple times and have never achieved a full run.

Debugging Info

I saved the notebook as a python file and ran it. I still observed the hanging behavior. When it hung I ctrl-c'd the process. Here is the stack trace.

 for move in dirs:
        new_board, moved = simulate(board, move)
        if not moved:
            continue
        score = sum(sum(row) for row in new_board)
        if score > best_score:
            best_score, best_move = score, move
    return best_move if best_move else "W"
┌───┬───┬───┬───┬───┬───┐
│  4│  2│ 16│  4│  2│  4│
├───┼───┼───┼───┼───┼───┤
│  2│  4│ 32│128│  8│ 32│
├───┼───┼───┼───┼───┼───┤
│  8│128│  4│512│ 64│ 16│
├───┼───┼───┼───┼───┼───┤
│ 16│  2│256│  8│ 32│  8│
├───┼───┼───┼───┼───┼───┤
│  8│ 32│ 16│  4│ 16│  4│
├───┼───┼───┼───┼───┼───┤
│  2│ 16│  4│  2│  8│  2│
└───┴───┴───┴───┴───┴───┘
{'loss': 0.0, 'grad_norm': 3.6165287494659424, 'learning_rate': 4.000000000000001e-06, 'num_tokens': 10835.0, 'completio
ns/mean_length': 535.5, 'completions/min_length': 485.0, 'completions/max_length': 586.0, 'completions/clipped_ratio': 0
.5, 'completions/mean_terminated_length': 485.0, 'completions/min_terminated_length': 485.0, 'completions/max_terminated
_length': 485.0, 'rewards/function_works/mean': -0.5, 'rewards/function_works/std': 2.1213202476501465, 'rewards/no_chea
ting/mean': 0.0, 'rewards/no_cheating/std': 1.4142135381698608, 'rewards/strategy_succeeds/mean': 1.0, 'rewards/strategy
_succeeds/std': 1.4142135381698608, 'reward': 0.5, 'reward_std': 4.949747562408447, 'frac_reward_zero_std': 0.0, 'comple
tion_length': 586.0, 'kl': 0.0032773646526038647, 'epoch': 0.01}
  1%|▋                                                                             | 9/1000 [31:21<59:11:33, 215.03s/it]
^C
^C^CTraceback (most recent call last):
  File "/home/cpadwick/code/unsloth/notebooks/nb/gpt_oss_(20B)_Reinforcement_Learning_2048_Game_DGX_Spark.py", line 717,
 in <module>
    trainer.train()
  File "/home/cpadwick/code/unsloth/notebooks/nb/unsloth_compiled_cache/UnslothGRPOTrainer.py", line 53, in wrapper
    output = f(self, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/transformers/trainer.py", line 2328, in train
    return inner_training_loop(
           ^^^^^^^^^^^^^^^^^^^^
  File "<string>", line 323, in _fast_inner_training_loop
  File "<string>", line 34, in _unsloth_training_step
File "/usr/local/lib/python3.12/dist-packages/trl/extras/profiling.py", line 98, in wrapper
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cpadwick/code/unsloth/notebooks/nb/unsloth_compiled_cache/UnslothGRPOTrainer.py", line 2015, in _prepare_inputs
    generation_batch = self._generate_and_score_completions(generation_batch)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cpadwick/code/unsloth/notebooks/nb/unsloth_compiled_cache/UnslothGRPOTrainer.py", line 2427, in _generate_and_score_completions
    rewards_per_func = self._calculate_rewards(inputs, original_prompts, completions, completion_ids_list)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/trl/extras/profiling.py", line 98, in wrapper
    return func(self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cpadwick/code/unsloth/notebooks/nb/unsloth_compiled_cache/UnslothGRPOTrainer.py", line 2063, in _calculate_rewards
    output_reward_func = reward_func(
                         ^^^^^^^^^^^^
  File "/home/cpadwick/code/unsloth/notebooks/nb/gpt_oss_(20B)_Reinforcement_Learning_2048_Game_DGX_Spark.py", line 613, in strategy_succeeds
    steps, game_state = execute_strategy(new_strategy, game)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/unsloth_zoo/rl_environments.py", line 363, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/cpadwick/code/unsloth/notebooks/nb/gpt_oss_(20B)_Reinforcement_Learning_2048_Game_DGX_Spark.py", line 402, in execute_strategy
    return _execute_strategy(strategy, game)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/cpadwick/code/unsloth/notebooks/nb/gpt_oss_(20B)_Reinforcement_Learning_2048_Game_DGX_Spark.py", line 374, in _execute_strategy
    game.do_action(action)
  File "/home/cpadwick/code/unsloth/notebooks/nb/gpt_oss_(20B)_Reinforcement_Learning_2048_Game_DGX_Spark.py", line 233, in do_action
    self._update_state_after_change()
  File "/home/cpadwick/code/unsloth/notebooks/nb/gpt_oss_(20B)_Reinforcement_Learning_2048_Game_DGX_Spark.py", line 245, in _update_state_after_change
    if not _can_move(self._board):
  File "/home/cpadwick/code/unsloth/notebooks/nb/gpt_oss_(20B)_Reinforcement_Learning_2048_Game_DGX_Spark.py", line 155, in _can_move
    if _empty_cells(board):
       ^^^^^^^^^^^^^^^^^^^
  File "/home/cpadwick/code/unsloth/notebooks/nb/gpt_oss_(20B)_Reinforcement_Learning_2048_Game_DGX_Spark.py", line 152, in _empty_cells
    return [(r, c) for r in range(size) for c in range(size) if board[r][c] == 0]
                                                 ^^^^^^^^^^^
KeyboardInterrupt
^C^CException ignored in atexit callback: <function shutdown_compile_workers at 0xe65599ecfd80>
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/async_compile.py", line 145, in shutdown_compile_workers
    pool.shutdown()
  File "/usr/local/lib/python3.12/dist-packages/torch/_inductor/compile_worker/subproc_pool.py", line 264, in shutdown
    self.process.wait(300)
  File "/usr/lib/python3.12/subprocess.py", line 1277, in wait
    self._wait(timeout=sigint_timeout)
  File "/usr/lib/python3.12/subprocess.py", line 2047, in _wait
    time.sleep(delay)
KeyboardInterrupt:

It always seems to get stuck in the _update_state_after_change() method. I'm not understanding why/how the execute_with_time_limit() decorator is being defeated since it wraps the execute_strategy method.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

notebook gpt_oss_(20B)_Reinforcement_Learning_2048_Game_DGX_Spark.ipynb hangs on DGX spark #122

Version Info

To Reproduce

Expected Behavior

Observed Behavior

Debugging Info

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

notebook gpt_oss_(20B)_Reinforcement_Learning_2048_Game_DGX_Spark.ipynb hangs on DGX spark #122

Description

Version Info

To Reproduce

Expected Behavior

Observed Behavior

Debugging Info

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions