Reading time: 45 minTraining time: 4 hoursTotal cost: $22
Step 1: Clone the starter repo and install dependencies
To get started, clone Summary-RL, which contains the following pieces of our RL pipeline:- The agent’s environment
- The reward function
- Some training examples
uv
by following the instructions here.
Then install the project dependencies by running uv sync
.
2. Install SkyPilot/RunPod
We’ll be usingSkyPilotBackend
to manage the GPU that your model will be trained on. You’ll need to install ART with the SkyPilot optional dependency:
3. Set up optional environment variables found in .env.example
.
In a new .env
file at the root of the repository, set the following optional environment variables:
WANDB_API_KEY
- Enables metric logging to Weights & Biases.OPENPIPE_API_KEY
- Enables chat completion logging to OpenPipe.OPENAI_API_KEY
- Will be necessary for later comparison benchmarks, but not used for training.
benchmarks
directory, but not for training itself. If you don’t already have AWS credentials with create/read/write permissions for s3 buckets, follow the instructions here.
AWS_ACCESS_KEY_ID
- Your AWS access key ID, which should have create/read/write permissions for s3 buckets.AWS_SECRET_ACCESS_KEY
- Your matching secret access key.AWS_REGION
- The region of the S3 bucket.BACKUP_BUCKET
- The name of the S3 bucket in which to store model checkpoints and logging data. Can be a new bucket or an existing one.
4. Run the training script
- Spin up a cluster with 1 H100 GPU.
- This usually takes about 10 minutes, but RunPod occasionally has network throughput issues that can cause the cluster to take up to 30 minutes to spin up. Once the cluster is provisioned, it can be used for subsequent training runs without going through this process again.
- Register the model with ART.
- This usually takes less than 5 minutes, though it can require up to 30 minutes if RunPod experiences network issues.
- Download the model checkpoint from S3.
- Usually takes a few seconds.
- Train the model for a specified number of steps.
- Training itself should be pretty quick (each step takes less than a minute), but the total training time will depend on how many steps you run for. During training, the model checkpoint is saved to S3 after each step.
- Upload the final model checkpoint to S3.
- This usually takes a few seconds.
5. Shutting down the cluster
When you’re done training and running benchmarks, you can shut down the cluster in two ways: Through the CLI:Running Benchmarks
Thebenchmark_models.py
script will compare the performance of the trained model to gpt-4o
, gpt-4.1
, o4-mini
, and gemini-2.5-pro-preview
.
Before running the benchmark script, make sure you’ve provided a valid OPENROUTER_API_KEY
and the AWS credentials detailed in step 3. These credentials are necessary for the script to upload the benchmark results to S3.
- Run each benchmarked model through each document in the validation set.
- Record the percentage of questions that each model’s summary allowed Gemini 2.5 Flash to answer correctly.
- Upload the results to S3.
benchmarks/display_benchmarks.ipynb
and running the cells. After running all the cells, you should see something like the following: