Multi-GPU and Multi-Node Runs#

DeepForest uses PyTorch Lightning distributed execution. For most multi-GPU and multi-node runs, these settings matter:

accelerator=gpu
devices=<gpus_per_node>
num_nodes=<nnodes>
strategy=ddp

On Slurm clusters, launch with srun inside your job allocation. Match #SBATCH --ntasks-per-node to devices and #SBATCH --nodes to num_nodes. See the cluster developer guide.

Single-GPU jobs can keep the default strategy=auto.

Train#

#SBATCH --nodes=<nnodes>
#SBATCH --ntasks-per-node=<gpus_per_node>
#SBATCH --gres=gpu:<gpus_per_node>

srun uv run deepforest train \
  --strategy ddp \
  accelerator=gpu \
  devices=<gpus_per_node> \
  num_nodes=<nnodes> \
  train.csv_file=/path/to/train.csv \
  train.root_dir=/path/to/train_images \
  validation.csv_file=/path/to/val.csv \
  validation.root_dir=/path/to/val_images

Evaluate#

srun uv run deepforest evaluate \
  /path/to/ground_truth.csv \
  --root-dir /path/to/images \
  --save-predictions eval_preds.csv \
  -o eval_metrics.csv \
  --strategy ddp \
  accelerator=gpu \
  devices=<gpus_per_node> \
  num_nodes=<nnodes>

Predict From CSV#

srun uv run deepforest predict \
  /path/to/images.csv \
  --mode csv \
  --root-dir /path/to/images \
  -o predictions.csv \
  --strategy ddp \
  accelerator=gpu \
  devices=<gpus_per_node> \
  num_nodes=<nnodes>

Predict A Large Tile#

For large geospatial rasters, use predict_tile(..., dataloader_strategy="window") instead of the simple CLI tile mode.

from deepforest.main import deepforest

m = deepforest()
m.load_model("weecology/everglades-bird-species-detector")
m.config.accelerator = "gpu"
m.config.devices = 2
m.config.num_nodes = 2
m.config.strategy = "ddp"
m.config.workers = 0
m.create_trainer()

results = m.predict_tile(
    path="/path/to/tile.tif",
    patch_size=1500,
    patch_overlap=0,
    dataloader_strategy="window",
)

Launch that script with the same srun Slurm pattern and trainer settings. For a complete cluster example, see the cluster developer guide.