交互式:Multi-turn RL Loop

Agent
(Policy)
Environment
(User/API)
Tool
Executor
Reward
Signal