Skip to content

Conversation

@HIT-cwh
Copy link
Collaborator

@HIT-cwh HIT-cwh commented Apr 2, 2024

Modified configs: deepseek, llama2, internlm2, yi and zephyr.

  1. Add sequence_parallel_size to configs
  2. accumulative_counts = accumulative_counts * sequence_parallel_size. Suppose I aim to employ a training strategy using a batch size per device of 1 with a maximum length of max_length on N GPUs. Upon setting the sequence parallelism dimension to SP, the accumulative counts have to be adjusted to SP times the original value. This modification is essential to assure training equivalence, as the sequence of max_length length will be segmented into SP parts, with each part being allocated to its respective GPU among the SP GPUs for parallelized training.
  3. If sequence_parallel_size is greater than 1, use SequenceParallelSampler, otherwise use DefaultSampler.
@HIT-cwh HIT-cwh merged commit ea33f46 into InternLM:main Apr 2, 2024
@HIT-cwh HIT-cwh deleted the fix_sp_configs branch April 2, 2024 05:20
llkn-2 pushed a commit to llkn-2/xtuner that referenced this pull request Jul 31, 2024
…onfigs (InternLM#538)

* add sp to configs

* add sp to configs

* fix comments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants