Initializing the client
The client that you’ll use to generate tokens and train your model is initialized through theart.TrainableModel
class.
Initializing from an existing SFT LoRA
If you’ve already fine-tuned a model with SFT using a LoRA adapter (e.g., Unsloth/PEFT) and have a standard Hugging Face–style adapter directory, you can start RL training from those weights by passing the adapter directory path asbase_model
when creating your TrainableModel
.
Why this?
- Warm-start from task-aligned weights to reduce steps/GPU cost.
- Stabilize early training, especially for small models (1B–8B) that may get near-zero rewards at RL start.