Annotation
==========
Annotations play a crucial role in machine learning projects. If you're unhappy with your model's performance, annotating new samples is the best first step to improving it.
How Should I Annotate Images?
-----------------------------
For quick annotations of a few images, we recommend using QGIS or ArcGIS, either as projected or unprojected data. You can create a shapefile for each image.
.. figure:: ../../www/QGIS_annotation.png
:alt: QGIS annotation
:align: center
:width: 80%
QGIS annotation example.
Label Studio
~~~~~~~~~~~~
For long-term projects, we recommend using `Label Studio `_ as an annotation platform. It offers many useful features and is easy to set up.
.. figure:: ../../www/label_studio.png
:alt: Label Studio annotation
:align: center
:width: 80%
Label Studio annotation platform.
Do I Need to Annotate All Objects in My Image?
----------------------------------------------
Yes! Object detection models use non-annotated areas of an image as negative data. While annotating all objects in an image can be challenging, missing annotations will cause the model to *ignore* objects that should be treated as positive samples, leading to poor performance.
How Can I Speed Up Annotation?
------------------------------
1. **Select Important Images**: Duplicate backgrounds or objects contribute little to model generalization. Focus on gathering a wide variety of object appearances.
2. **Avoid Over-splitting Labels**: Often, using a superclass for detection followed by a separate model for classification is more effective. See the ```CropModel`` `_ for an example.
3. **Balance Accuracy and Practicality**: Depending on the goal (e.g., object counting or detection), keypoints can sometimes be used instead of precise boxes to simplify the process.
Quick Video on Annotating Images
--------------------------------
Here is a video demonstrating a simple way to annotate images:
.. raw:: html
Converting Shapefile Annotations to DataFrame
---------------------------------------------
You can convert shapefile points into bounding box annotations using the following code:
.. code-block:: python
df = shapefile_to_annotations(
shapefile="annotations.shp",
rgb="image_path",
convert_to_boxes=True,
buffer_size=0.15
)
Cutting Large Tiles into Pieces
-------------------------------
Annotating large airborne imagery can be challenging. DeepForest has a utility to crop images into smaller, more manageable chunks.
.. code-block:: python
raster = get_data("2019_YELL_2_528000_4978000_image_crop2.png")
output_crops = preprocess.split_raster(
path_to_raster=raster,
annotations_file=None,
save_dir=tmpdir,
patch_size=500,
patch_overlap=0
)
Starting Annotations from Pre-labeled Imagery
---------------------------------------------
You can speed up new annotations by starting with model predictions. Below is an example of predicting detections and saving them as shapefiles, which can then be edited in a tool like QGIS.
.. code-block:: python
from deepforest import main
from deepforest.visualize import plot_predictions
from deepforest.utilities import boxes_to_shapefile
import rasterio as rio
import geopandas as gpd
from glob import glob
import os
import matplotlib.pyplot as plt
import numpy as np
from shapely import geometry
PATH_TO_DIR = "/path/to/directory"
files = glob(f"{PATH_TO_DIR}/*.JPG")
m = main.deepforest(label_dict={"Bird": 0})
m.load_model(model_name="weecology/deepforest-bird", revision="main")
for path in files:
boxes = m.predict_image(path=path)
rio_src = rio.open(path)
image = rio_src.read()
if boxes is None:
continue
image = np.rollaxis(image, 0, 3)
fig = plot_predictions(df=boxes, image=image)
plt.imshow(fig)
basename = os.path.splitext(os.path.basename(path))[0]
shp = boxes_to_shapefile(boxes, root_dir=PATH_TO_DIR, projected=False)
shp.to_file(f"{PATH_TO_DIR}/{basename}.shp")
Reading XML Annotations in Pascal VOC Format
--------------------------------------------
DeepForest can read annotations in Pascal VOC format, a widely-used dataset format for visual object detection. The ``read_pascal_voc`` function reads XML annotations and converts them into a format suitable for use with models like RetinaNet.
Example:
.. code-block:: python
from deepforest import get_data
from deepforest.utilities import read_pascal_voc
xml_path = get_data("OSBS_029.xml")
df = read_pascal_voc(xml_path)
print(df)
This prints:
.. code-block:: text
image_path xmin ymin xmax ymax label
0 OSBS_029.tif 203 67 227 90 Tree
1 OSBS_029.tif 256 99 288 140 Tree
2 OSBS_029.tif 166 253 225 304 Tree
3 OSBS_029.tif 365 2 400 27 Tree
...
Fast Iterations for Annotation Success
--------------------------------------
Avoid collecting all annotations before model testing. Start with a small number of annotations and let the model highlight which images are most needed. Fast iterations lead to quicker model improvement. For an example in wildlife sensing, see `Kellenberger et al., 2019 `_.
Please Make Your Annotations Open-Source!
=========================================
DeepForest's models are not perfect. Please consider sharing your annotations with the community to make the models stronger. You can post your annotations on Zenodo or open an `issue `_ to share your data with the maintainers.
How Can I Get New Airborne Data?
================================
Many remote sensing assets are available via ArcGIS REST protocol. DeepForest provides tools to work with these assets, such as `California NAIP data `_.
Specify a Lat-Long Box and Crop an ImageServer Asset
----------------------------------------------------
.. code-block:: python
from deepforest import utilities
import matplotlib.pyplot as plt
import rasterio as rio
import os
import asyncio
from aiolimiter import AsyncLimiter
async def main():
url = "https://map.dfg.ca.gov/arcgis/rest/services/Base_Remote_Sensing/NAIP_2020_CIR/ImageServer/"
xmin, ymin, xmax, ymax = -124.112622, 40.493891, -124.111536, 40.49457
tmpdir = ""
image_name = "example_crop.tif"
semaphore = asyncio.Semaphore(1)
limiter = AsyncLimiter(1, 0.05)
os.makedirs(tmpdir, exist_ok=True)
filename = await utilities.download_ArcGIS_REST(
semaphore, limiter, url, xmin, ymin, xmax, ymax, "EPSG:4326", savedir=tmpdir, image_name=image_name
)
assert os.path.exists(os.path.join(tmpdir, image_name))
with rio.open(os.path.join(tmpdir, image_name)) as src:
assert src.crs is not None
plt.imshow(src.read().transpose(1, 2, 0))
plt.show()
asyncio.run(main())
Downloading a Batch of Images
-----------------------------
.. code-block:: python
import asyncio
import pandas as pd
from aiolimiter import AsyncLimiter
from deepforest import utilities
async def download_crops(result_df, tmp_dir):
url = 'https://map.dfg.ca.gov/arcgis/rest/services/Base_Remote_Sensing/NAIP_2022/ImageServer'
semaphore = asyncio.Semaphore(20)
limiter = AsyncLimiter(1, 0.05)
tasks = []
for idx, row in result_df.iterrows():
xmin, ymin, xmax, ymax = row['xmin'], row['ymin'], row['xmax'], row['ymax']
os.makedirs(tmp_dir, exist_ok=True)
image_name = f"image_{idx}.tif"
task = utilities.download_ArcGIS_REST(semaphore, limiter, url, xmin, ymin, xmax, ymax, "EPSG:4326", savedir=tmp_dir, image_name=image_name)
tasks.append(task)
await asyncio.gather(*tasks)