Expert Parallelism: Token Routing & All-to-All (4 GPUs, 8 Experts, top-1)
Step 0 / 5