Skip to content

Conversation

@ggerganov
Copy link
Member

cont #13706

The llama_kv_cells_unified now tracks 2 new min/max quantities:

  • The min/max indices of used cells
    Mainly needed for determining the range of KV cells [0, n_kv) which are considered during the attention computation with unified KV cache (a.k.a. the old cell_max()).

  • The min/max positions for each sequence currently present in the KV cache
    Will be used in kv-cache : refactor + add llama_memory_state_i #13746 to improve the find_slot() logic in SWA cases.

To amortize the cost, we utilize std::set.

@ggerganov ggerganov force-pushed the gg/kv-cells-track-min-max branch from dc6b26d to 9bf13fd Compare May 26, 2025 15:33
@ggerganov ggerganov merged commit 8171312 into master May 27, 2025
53 checks passed
@ggerganov ggerganov deleted the gg/kv-cells-track-min-max branch May 27, 2025 10:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants