Reading in Data#

The DeepForest data model#

The DeepForest data model has four components:

  1. Annotations are stored as dataframes. Each row is an annotation with a single geometry and label. Each annotation dataframe must contain a ‘image_path’, which is the basename, not full path to the image, and a ‘label’ column.

  2. Annotation geometry is stored as a shapely object, allowing the easy movement among Point, Polygon and Box representations.

  3. Annotations are expressed in image coordinates, not geographic coordinates. There are utilities to convert geospatial data (.shp, .gpkg) to DeepForest data formats.

  4. A root_dir attribute that specifies where the images are stored. A Dee

The read_file function#

DeepForest has collated many use cases into a single read_file function that will read many common data formats, both projected and unprojected, and create a dataframe ready for DeepForest functions that fits the DeepForest data model.

Example 1: A csv file containing box annotations.#

from deepforest import utilities

df = utilities.read_file("annotations.csv", root_dir="directory containing images", image_path="relative path to the image>", label="Tree")

For files that lack an image_path or label column, pass the image_path or label argument. This applies the same image_path and label for the entire file, and is not appropriate for multi-image files.

from deepforest import utilities

gdf = utilities.read_file(
    input="/path/to/annotations.shp",
    image_path="OSBS_029.tif",   # required if no image_path column
    root_dir="path/to/images/"   # required is image_path argument is used
    label="Tree"                 # optional: used if no 'label' column in the shapefile
)

At a high level, read_file will:

  1. Check the file extension to determine the format.

  2. Read and convert the file into a GeoPandas dataframe.

  3. Append the location of the image directory as a ‘root_dir’ attribute.

  4. If input data is a geospatial object, such as a shapefile, convert geographic coordinates to image coordinates based on the coordinate reference system (CRS) and resolution of the image.

Allows for the following formats:

  • CSV (.csv)

  • Shapefile (.shp)

  • GeoPackage (.gpkg)

  • COCO (.json)

  • Pascal VOC (.xml)

Boxes#

CSV#

Here, the annotations are in plain CSV files, with coordinates relative to the image origin.

image_path,xmin,ymin,xmax,ymax,label
OSBS_029.tif,203,67,227,90,Tree
OSBS_029.tif,256,99,288,140,Tree
OSBS_029.tif,166,253,225,304,Tree
OSBS_029.tif,365,2,400,27,Tree
OSBS_029.tif,312,13,349,47,Tree
OSBS_029.tif,365,21,400,70,Tree
OSBS_029.tif,278,1,312,37,Tree
OSBS_029.tif,364,204,400,246,Tree
from deepforest import get_data
from deepforest.utilities import read_file

filename = get_data("OSBS_029.csv")
df = read_file(filename)

Example output:

      image_path  xmin  ymin  xmax  ymax label                                           geometry
0   OSBS_029.tif   203    67   227    90  Tree  POLYGON ((227.000 67.000, 227.000 90.000, 203....
1   OSBS_029.tif   256    99   288   140  Tree  POLYGON ((288.000 99.000, 288.000 140.000, 256...
2   OSBS_029.tif   166   253   225   304  Tree  POLYGON ((225.000 253.000, 225.000 304.000, 16...
3   OSBS_029.tif   365     2   400    27  Tree  POLYGON ((400.000 2.000, 400.000 27.000, 365.0...
4   OSBS_029.tif   312    13   349    47  Tree  POLYGON ((349.000 13.000, 349.000 47.000, 312....

Note: To maintain continuity with versions < 1.4.0, the function for boxes continues to output xmin, ymin, xmax, and ymax columns as individual columns as well.

The location of these image files is saved in the root_dir attribute

df.root_dir
'/Users/benweinstein/Documents/DeepForest/deepforest/data'

COCO#

COCO format is a popular format for object detection tasks. It is a JSON file that contains information about the images and annotations.

from deepforest import utilities

df = utilities.read_file(input="/path/to/coco_annotations.json")
df.head()

Pascal VOC#

Pascal VOC format is a popular format for object detection tasks. It is a XML file that contains information about the images and annotations.

from deepforest import utilities

df = utilities.read_file(input="/path/to/pascal_voc_annotations.xml")
df.head()

Shapefiles#

Geographic data can also be saved as shapefiles with projected coordinates.

Example:

gdf.iloc[0]
geometry      POLYGON ((404222.4 3285121.5, 404222.4 3285122...
label                                                      Tree
image_path    /Users/benweinstein/Documents/DeepForest/deepf...
Name: 0, dtype: object

These coordinates are made relative to the image origin when the file is read.

from deepforest import utilities

shp = utilities.read_file(input="/path/to/boxes_shapefile.shp")
shp.head()

If your shapefile does not include an image_path column, you must provide the raster path via img_path:

from deepforest import utilities

shp = utilities.read_file(
    input="/path/to/boxes_shapefile.shp",
    image_path="/path/to/OSBS_029.tif"
)

If your shapefile also lacks a label column, you can assign one for all rows:

from deepforest import utilities

shp = utilities.read_file(
    input="/path/to/boxes_shapefile.shp",
    image_path="/path/to/OSBS_029.tif",
    label="Tree"
)

Example output:

  label    image_path                                           geometry
0  Tree  OSBS_029.tif  POLYGON ((105.000 214.000, 95.000 214.000, 95....
1  Tree  OSBS_029.tif  POLYGON ((205.000 214.000, 195.000 214.000, 19...

Points#

CSV#

Example:

x,y,label
10,20,Tree
15,30,Tree

Shapefile#

from deepforest import utilities

shp = utilities.read_file(input="/path/to/points_shapefile.shp")
annotations.head()

Example output:

  label    image_path                 geometry
0  Tree  OSBS_029.tif  POINT (100.000 209.000)
1  Tree  OSBS_029.tif  POINT (200.000 209.000)

Polygons#

CSV#

Polygons are expressed in well-known-text (WKT) format. Learn more about WKT.

"POLYGON ((0 0, 0 2, 1 1, 1 0, 0 0))",Tree,OSBS_029.png
"POLYGON ((2 2, 2 4, 3 3, 3 2, 2 2))",Tree,OSBS_029.png

Shapefile#

from deepforest import utilities

shp = utilities.read_file(input="/path/to/polygons_shapefile.shp")
annotations.head()

Example output:

  label    image_path                                           geometry
0  Tree  OSBS_029.png  POLYGON ((0.00000 0.00000, 0.00000 2.00000, 1....
1  Tree  OSBS_029.png  POLYGON ((2.00000 2.00000, 2.00000 4.00000, 3....