-
Inference
- Your code uses the ART client to perform an agentic workflow (usually executing several rollouts in parallel to gather data faster).
- Completion requests are routed to the ART backend, which runs the model’s latest LoRA in vLLM.
- As the agent executes, each
system
,user
, andassistant
message is stored in a Trajectory. - After your rollouts finish, your code assigns a
reward
to each Trajectory, with higher rewards indicating better performance than low ones.
-
Training
- When all rollouts have finished, Trajectories are grouped and sent to the backend. Inference is blocked while training executes.
- The backend trains your model using GRPO, initializing from the latest checkpoint (or an empty LoRA on the first iteration).
- The backend saves the newly trained LoRA to a local directory and loads it into vLLM.
- Inference is unblocked and the loop resumes at step 1.