Skip to content

vllm.model_executor.warmup.kernel_warmup

Warmup kernels used during model execution. This is useful specifically for JIT'ed kernels as we don't want JIT'ing to happen during model execution.

kernel_warmup

kernel_warmup(model: Module, max_tokens: int)
Source code in vllm/model_executor/warmup/kernel_warmup.py
def kernel_warmup(model: torch.nn.Module, max_tokens: int):
    do_deep_gemm_warmup = (envs.VLLM_USE_DEEP_GEMM
                           and is_deep_gemm_supported()
                           and not envs.VLLM_SKIP_DEEP_GEMM_WARMUP)
    if do_deep_gemm_warmup:
        deep_gemm_warmup(model, max_tokens)