Discovery Metadata System

Happy Humpbacks: A machine learning-ready drone dataset for whale detection model development


GB/NERC/BAS/PDC/02202

Abstract:
We present the Happy Humpbacks dataset, a machine learning-ready collection of drone imagery and annotations for object detection. The dataset comprises 5,281 images containing 10,401 instances of humpback whales. Imagery was collected in the waters surrounding Palmer Station, Western Antarctic Peninsula, between January and March 2020. Data acquisition was conducted over 55 flights using a DJI Phantom 4 Pro multirotor drone, as part of the Palmer Long Term Ecological Research Program. Images were manually annotated by the British Antarctic Survey using LabelMe software, with bounding boxes delineating individual whales. Annotations are provided in both LabelMe and COCO formats, along with predefined training, validation, and test splits. This dataset captures substantial variability in whale behaviour, morphology, and environmental conditions, reflecting the challenges of real-world remote sensing imagery. It is intended to support the development and benchmarking of object detection models for automated whale monitoring.


This work was supported by the Natural Environment Research Council grant (Grant no. NE/S007164/1). Data was collected as part of the Palmer Long Term Ecological Research Program (Grant no. 1440435 ).


Citation:
Houliston, H.R., Cheng, Y., Johnston, D.W., Larsen, G.D., Friedlaender, A.S., Fretwell, P.T., Jackson, J.A., Cubaynes, H.C., Schonlieb, C., & Aviles-Rivero, A. (2026). Happy Humpbacks: A machine learning-ready drone dataset for whale detection model development (Version 1.0) [Data set]. NERC EDS UK Polar Data Centre. https://doi.org/10.5285/7a952870-9880-415d-a8ab-194fedf01a26