vllm.distributed.eplb.policy.abstract ¶
AbstractEplbPolicy ¶
Bases: ABC
Source code in vllm/distributed/eplb/policy/abstract.py
rebalance_experts abstractmethod classmethod ¶
rebalance_experts(
weight: Tensor,
num_replicas: int,
num_groups: int,
num_nodes: int,
num_ranks: int,
old_global_expert_indices: Tensor | None = None,
) -> Tensor
Entry point for expert-parallelism load balancer.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
weight | Tensor | [layers, num_logical_experts], the load statistics for all logical experts | required |
num_replicas | int | number of physical experts, must be a multiple of | required |
num_groups | int | number of expert groups | required |
num_nodes | int | number of server nodes | required |
num_ranks | int | number of ranks, must be a multiple of | required |
old_global_expert_indices | Tensor | None | [layers, num_logical_experts], the old global expert indices. Used to avoid unnecessary weight copying for experts moving within one rank. | None |
Returns: physical_to_logical_map: [layers, num_replicas], the expert index of each replica