Skip to content
vLLM
vllm.v1.spec_decode
Initializing search
GitHub
Home
User Guide
Developer Guide
API Reference
CLI Reference
Community
vLLM
GitHub
Home
User Guide
Developer Guide
API Reference
API Reference
vllm.beam_search
vllm.collect_env
vllm.connections
vllm.env_override
vllm.envs
vllm.forward_context
vllm
vllm.jsontree
vllm.logger
vllm.logits_process
vllm.outputs
vllm.pooling_params
vllm.sampling_params
vllm.scalar_type
vllm.scripts
vllm.sequence
vllm.tasks
vllm.test_utils
vllm.tracing
vllm.version
vllm.adapter_commons
vllm.adapter_commons
vllm.adapter_commons.layers
vllm.adapter_commons.models
vllm.adapter_commons.request
vllm.adapter_commons.utils
vllm.adapter_commons.worker_manager
vllm.assets
vllm.assets
vllm.assets.audio
vllm.assets.base
vllm.assets.image
vllm.assets.video
vllm.attention
vllm.attention
vllm.attention.layer
vllm.attention.selector
vllm.attention.backends
vllm.attention.layers
vllm.attention.ops
vllm.attention.utils
vllm.benchmarks
vllm.benchmarks
vllm.benchmarks.datasets
vllm.benchmarks.latency
vllm.benchmarks.serve
vllm.benchmarks.throughput
vllm.benchmarks.lib
vllm.compilation
vllm.compilation
vllm.compilation.activation_quant_fusion
vllm.compilation.backends
vllm.compilation.base_piecewise_backend
vllm.compilation.collective_fusion
vllm.compilation.compiler_interface
vllm.compilation.counter
vllm.compilation.cuda_piecewise_backend
vllm.compilation.decorators
vllm.compilation.fix_functionalization
vllm.compilation.fusion
vllm.compilation.fusion_attn
vllm.compilation.fx_utils
vllm.compilation.inductor_pass
vllm.compilation.monitor
vllm.compilation.multi_output_match
vllm.compilation.noop_elimination
vllm.compilation.pass_manager
vllm.compilation.sequence_parallelism
vllm.compilation.torch25_custom_graph_pass
vllm.compilation.vllm_inductor_pass
vllm.compilation.wrapper
vllm.config
vllm.config
vllm.config.cache
vllm.config.compilation
vllm.config.parallel
vllm.config.utils
vllm.core
vllm.core
vllm.core.block_manager
vllm.core.evictor
vllm.core.interfaces
vllm.core.placeholder_block_space_manager
vllm.core.scheduler
vllm.core.block
vllm.device_allocator
vllm.device_allocator
vllm.device_allocator.cumem
vllm.distributed
vllm.distributed
vllm.distributed.communication_op
vllm.distributed.kv_events
vllm.distributed.parallel_state
vllm.distributed.tpu_distributed_utils
vllm.distributed.utils
vllm.distributed.device_communicators
vllm.distributed.eplb
vllm.distributed.kv_transfer
vllm.engine
vllm.engine
vllm.engine.arg_utils
vllm.engine.async_llm_engine
vllm.engine.async_timeout
vllm.engine.llm_engine
vllm.engine.metrics
vllm.engine.metrics_types
vllm.engine.protocol
vllm.engine.multiprocessing
vllm.engine.output_processor
vllm.entrypoints
vllm.entrypoints
vllm.entrypoints.api_server
vllm.entrypoints.chat_utils
vllm.entrypoints.context
vllm.entrypoints.harmony_utils
vllm.entrypoints.launcher
vllm.entrypoints.llm
vllm.entrypoints.logger
vllm.entrypoints.score_utils
vllm.entrypoints.ssl
vllm.entrypoints.tool
vllm.entrypoints.tool_server
vllm.entrypoints.utils
vllm.entrypoints.cli
vllm.entrypoints.openai
vllm.executor
vllm.executor
vllm.executor.executor_base
vllm.executor.mp_distributed_executor
vllm.executor.msgspec_utils
vllm.executor.multiproc_worker_utils
vllm.executor.ray_distributed_executor
vllm.executor.ray_utils
vllm.executor.uniproc_executor
vllm.inputs
vllm.inputs
vllm.inputs.data
vllm.inputs.parse
vllm.inputs.preprocess
vllm.inputs.registry
vllm.logging_utils
vllm.logging_utils
vllm.logging_utils.dump_input
vllm.logging_utils.formatter
vllm.lora
vllm.lora
vllm.lora.fully_sharded_layers
vllm.lora.layers
vllm.lora.lora
vllm.lora.models
vllm.lora.peft_helper
vllm.lora.request
vllm.lora.resolver
vllm.lora.utils
vllm.lora.worker_manager
vllm.lora.ops
vllm.lora.punica_wrapper
vllm.model_executor
vllm.model_executor
vllm.model_executor.custom_op
vllm.model_executor.parameter
vllm.model_executor.pooling_metadata
vllm.model_executor.sampling_metadata
vllm.model_executor.utils
vllm.model_executor.layers
vllm.model_executor.model_loader
vllm.model_executor.models
vllm.model_executor.warmup
vllm.multimodal
vllm.multimodal
vllm.multimodal.audio
vllm.multimodal.base
vllm.multimodal.cache
vllm.multimodal.hasher
vllm.multimodal.image
vllm.multimodal.inputs
vllm.multimodal.parse
vllm.multimodal.processing
vllm.multimodal.profiling
vllm.multimodal.registry
vllm.multimodal.utils
vllm.multimodal.video
vllm.platforms
vllm.platforms
vllm.platforms.cpu
vllm.platforms.cuda
vllm.platforms.interface
vllm.platforms.neuron
vllm.platforms.rocm
vllm.platforms.tpu
vllm.platforms.xpu
vllm.plugins
vllm.plugins
vllm.plugins.lora_resolvers
vllm.profiler
vllm.profiler
vllm.profiler.layerwise_profile
vllm.profiler.utils
vllm.ray
vllm.ray
vllm.ray.lazy_utils
vllm.ray.ray_env
vllm.reasoning
vllm.reasoning
vllm.reasoning.abs_reasoning_parsers
vllm.reasoning.deepseek_r1_reasoning_parser
vllm.reasoning.glm4_moe_reasoning_parser
vllm.reasoning.gptoss_reasoning_parser
vllm.reasoning.granite_reasoning_parser
vllm.reasoning.hunyuan_a13b_reasoning_parser
vllm.reasoning.mistral_reasoning_parser
vllm.reasoning.qwen3_reasoning_parser
vllm.reasoning.step3_reasoning_parser
vllm.transformers_utils
vllm.transformers_utils
vllm.transformers_utils.config
vllm.transformers_utils.detokenizer
vllm.transformers_utils.detokenizer_utils
vllm.transformers_utils.dynamic_module
vllm.transformers_utils.processor
vllm.transformers_utils.s3_utils
vllm.transformers_utils.tokenizer
vllm.transformers_utils.tokenizer_base
vllm.transformers_utils.tokenizer_group
vllm.transformers_utils.utils
vllm.transformers_utils.chat_templates
vllm.transformers_utils.configs
vllm.transformers_utils.processors
vllm.transformers_utils.tokenizers
vllm.triton_utils
vllm.triton_utils
vllm.triton_utils.importing
vllm.usage
vllm.usage
vllm.usage.usage_lib
vllm.utils
vllm.utils
vllm.utils.deep_gemm
vllm.utils.flashinfer
vllm.utils.tensor_schema
vllm.v1
vllm.v1
vllm.v1.kv_cache_interface
vllm.v1.outputs
vllm.v1.request
vllm.v1.serial_utils
vllm.v1.utils
vllm.v1.attention
vllm.v1.core
vllm.v1.engine
vllm.v1.executor
vllm.v1.metrics
vllm.v1.pool
vllm.v1.sample
vllm.v1.spec_decode
vllm.v1.spec_decode
vllm.v1.spec_decode.eagle
vllm.v1.spec_decode.medusa
vllm.v1.spec_decode.metadata
vllm.v1.spec_decode.metrics
vllm.v1.spec_decode.ngram_proposer
vllm.v1.spec_decode.utils
vllm.v1.structured_output
vllm.v1.worker
vllm.worker
vllm.worker
vllm.worker.cache_engine
vllm.worker.enc_dec_model_runner
vllm.worker.model_runner
vllm.worker.model_runner_base
vllm.worker.multi_step_model_runner
vllm.worker.multi_step_neuron_model_runner
vllm.worker.multi_step_neuronx_distributed_model_runner
vllm.worker.multi_step_worker
vllm.worker.neuron_model_runner
vllm.worker.neuron_worker
vllm.worker.neuronx_distributed_model_runner
vllm.worker.pooling_model_runner
vllm.worker.utils
vllm.worker.worker
vllm.worker.worker_base
CLI Reference
Community
Table of contents
spec_decode
vllm.v1.spec_decode
Modules:
Name
Description
eagle
medusa
metadata
metrics
ngram_proposer
utils
Back to top