Sequence Parallelism: TP ↔ SP 区域切换 (4 GPUs)