Point annotation dataset of stranded whale and dolphin species identified in very high-resolution optical and SAR satellite imagery along offshore islands of New Zealand and Tasmania between 2018-2023
GB/NERC/BAS/PDC/02083
Abstract:
This dataset presents point annotations of stranded whale (Sperm whales, Physeter macrocephalus) and dolphin (Pilot whales, Globicephala melas edwardii) species identified in very high-resolution (VHR) optical and SAR satellite imagery, along offshore islands of New Zealand and Tasmania, between 2018-2023. Cetacean strandings offer significant conservation value for the assessment of ecosystems and serve as early warning of emerging concerns regarding animal, ocean, and human health. However stranding monitoring programmes are infrequent to non-existent along minimally populated areas, coastal areas with limited economic resources, geographically remote areas, complex coastlines and areas of geopolitical unrest. VHR satellite imagery offers the prospect of improving monitoring in these regions. While VHR satellite imagery is able to detect large baleen whale strandings (>12 m), mass strandings (strandings of two or more animals excluding mother calf pairs) are predominantly of smaller-sized odontocetes (~1-6m toothed whale and dolphin species). Detecting odontocetes is therefore crucial for VHR satellites to be useful for monitoring strandings globally. In addition, scaling up the use of VHR optical satellite imagery is limited by cloud cover, the primary environmental condition governing successful imagery collection. Synthetic aperture radar (SAR) satellites enable VHR imaging of Earth in cloudy regions and in darkness. This approach could facilitate strandings detection in cloudy regions and independent of daylight hours, which is critical for enabling timely emergency responses to unfolding stranding events. Here, we present data from four smaller odontocete mass strandings of long-finned pilot whale (LFPW), on Chatham, Pitt and Stewart Island, New Zealand, and one large odontocete (sperm whale) mass stranding on King Island, Tasmania, Australia between 2018-2023, to successfully detect and quantify large and small odontocete strandings in VHR optical and SAR satellite imagery.
This research has been supported by the Natural Environment Research Council (NERC) through a SENSE CDT studentship (grant no. NE/T00939X/1). The research was further supported by additional funding provided through, the British Antarctic Survey (BAS) Innovation Voucher, Sentinel Hub and their #30MapChallenge competition, BAS Ecosystems, and the support and cooperation of Airbus (Deutsches Zentrum fur Luft- und Raumfahrt, DLR) and Vantor (formerly Maxar Technologies Ltd), for their rapid response and efforts to enable successful collection of the imagery analysed here.
Citation:
Clarke, P.J., Cubaynes, H.C., Bowler, E., Jackson, J.A., Attard, M.R.G., Stockin, K.A., & Carlyon, K. (2025). Point annotation dataset of stranded whale and dolphin species identified in very high-resolution optical and SAR satellite imagery along offshore islands of New Zealand and Tasmania between 2018-2023 (Version 1.0) [Data set]. NERC EDS UK Polar Data Centre. https://doi.org/10.5285/b26c3a3d-73c8-4500-9a23-696011c20a45
Access Data
GET DATA
REFERENCE MATERIALS
REFERENCE MATERIALS
REFERENCE MATERIALS
REFERENCE MATERIALS
REFERENCE MATERIALS
SOFTWARE PACKAGES
License
| Access: | These data are under embargo until the publication of the associated manuscript. |
|---|---|
| Use: | Data supplied under Open Government Licence v3.0 http://www.nationalarchives.gov.uk/doc/open-government-licence/version/3/. |
Basic Information
| Creation Date: | 2025-08-13 |
|---|---|
| Dataset Progress: | Complete |
| Dataset Language: | English |
| ISO Topic Categories: |
|
| Parameters: |
|
Additional Information
| Reference: | Associated paper: Clarke, P.J., H.C. Clarke, P.J., H.C. Cubaynes, K.A. Stockin, E. Bowler, K. Carlyon, M.R.G. Attard, P.F. Fretwell, A. Skachkova, A. de Vos, K.M. McConnell, E. Medina-Lopez, S. Lopez Dubon, and J.A. Jackson (In review) Odontocete strandings from space: Accurately counting individuals with very-high resolution optical and SAR satellite imagery. Ecological Informatics. Cubaynes, K.A. Stockin, E. Bowler, K. Carlyon, M.R.G. Attard, P.F. Fretwell, A. Skachkova, A. de Vos, K.M. McConnell, E. Medina-Lopez, S. Lopez Dubon, and J.A. Jackson, Open-source satellite image pre-processing and annotation workflows: Stranded whale and dolphin case study. MethodsX , 2026. Available at: https://doi.org/10.1016/j.mex.2026.103949. References: Attard, M.R.G., R.A. Phillips, S. Oppel, E. Bowler, and P.T. Fretwell, Feasibility of using very high-resolution satellite imagery to monitor Tristan albatrosses Diomedea dabbenena on Gough Island. Endangered Species Research, 2025. 56: p. 187-199. Available at: https://doi.org/10.3354/esr01396. Betty, E.L., K.A. Stockin, B. Hinton, B. Bollard, M.B. Orams, and S. Murphy, Age- and sex-specific survivorship of the Southern Hemisphere long-finned pilot whale (Globicephala melas edwardii). Journal of Mammalogy, 2023. 104(1): p. 39-48. Available at: https://doi.org/10.1093/jmammal/gyac085. Betty, E.L., K.A. Stockin, B. Hinton, B.A. Bollard, A.N.H. Smith, M.B. Orams, and S. Murphy, Age, growth, and sexual dimorphism of the Southern Hemisphere long-finned pilot whale (Globicephala melas edwardii). Journal of Mammalogy, 2022. 103(3): p. 560?575. Available at: https://doi.org/10.1093/jmammal/gyab165. Cubaynes, H.C., P.J. Clarke, K.T. Goetz, T. Aldrich, P.T. Fretwell, K.E. Leonard, and C.B. Khan, Annotating very high-resolution satellite imagery: A whale case study. MethodsX, 2023. 10, article 102040. Available at: https://doi.org/10.1016/j.mex.2023.102040. Cubaynes, H.C., J. Forcada, K.M. Kovacs, C. Lydersen, R. Downie, and P.T. Fretwell, Walruses from space: walrus counts in simultaneous remotely piloted aircraft system versus very high-resolution satellite imagery. Remote Sensing in Ecology and Conservation, 2024. 10(5): p. 584-596. Available at: https://doi.org/10.1002/rse2.391. |
|
|---|---|---|
| Quality: | For accurate comparison between observers, all annotations were made in true colour (red, green, blue bands) under the same band rendering properties, e.g., contrast enhancement, brightness, contrast, saturation and gamma. The three observers selected were experienced in reviewing satellite imagery for wildlife from space. Each observer followed the standardised protocols for annotation. Observers reviewed imagery independently and sequentially from low to high spatial resolution to avoid introducing biases when reviewing lower resolution satellite imagery. For SAR image counts, the matching optical and SAR imagery was split in half, and the opposite halves of the corresponding imagery were provided to each observer in two separate experiments ('half_raster_1.shp' and 'half_raster_2.shp' are polygon shapefiles that provide the extent used to clip the matched SAR and optical images in half). This ensured the observer SAR counts were independent of visual aids from the optical imagery, while still allowing coastline alignment for reference. While measures were taken to ensure the highest level of accuracy, the annotation of stranded cetacean features in all imagery, particularly satellite imagery, may be impacted by: -observer experience in the target feature (observers include a live and stranded cetacean from space expert, another a whales and walruses from space expert and finally a walruses and albatross from space expert) -prevailing environmental conditions: cloud cover and diverse environmental backgrounds -sensor specification: spatial resolution and nadir angle (angle of image collection, which can distort an image) -target species: morphological characteristics, confounding features (features misidentified as cetaceans), and decomposition phase, -image quality: solar reflection/glare and image darkness **Resolution** The annotation data presented here are associated with a specific image and spatial resolution. GSD refers to the 'Ground Sampling Distance'' (the distance on the ground represented per pixel, the higher the spatial resolution the more detail is visible) in metres. GSD is provided per satellite image annotated. Where more than one value exists, multiple spatial resolutions of the same image were assessed. Sensor specific native spatial resolution (0.5 or 0.3 m), is the spatial resolution that the sensor collects imagery. Images were artificially down-sampled (reduced spatial resolution), and/or up-sampled (enhanced spatial resolution) by the satellite image provider using proprietary algorithms, to achieve, where possible, imagery at three spatial resolutions, 0.5, 0.3 and 0.15 m. The first values for each image catalogue ID below indicate the native spatial resolution of the satellite sensor. 1. Stewart Island: - DS_PHR1B_201811282258003_FR1_PX_E167S47_1106_03071 - GSD (m): 0.5 - 1030010089B22D00 - GSD (m): 0.5 & 0.3 2. King Island: - 105001002F0CA400 - GSD (m): 0.5 & 0.3 3.a Chatham Island - 105001002F60AA00 - GSD (m): 0.5 - 10300100DC306300 - GSD (m): 0.5 & 0.3 - 104001007E5E7400 - GSD (m): 0.3, 0.5, & 0.15 3.b Chatham Island - 104001008D07E000 - GSD (m): 0.3 - C542_N85_D_ST_spot_029_R_2023-10-05T17:04:56.332868Z - GSD (m): 0.28 4. Pitt Island - 10300100DB012A00 - GSD (m): 0.5 For the aerial imagery annotated, the spatial extent and resolution was unknown. To perform clustering pixel-based measurements were generated by estimating real-world distances from average morphometrics calculated across several hundred male and female pilot whales sampled from New Zealand strandings ((Betty et al., 2022, Betty et al., 2023, mean length of adult males: 5.5 mean length of adult females: 4.32 m, as we were unable to distinguish sex within the aerial image, we used an average of the measurements for both sexes to equal 4.91 m, to take measurements from the tip of the rostrum to the notch of the fluke), visible in the image, using ImageJ (version 1.54g; Java 1.8.0_345 [64-bit], see supplemental S5 in associated manuscript for methodology). 1. Stewart Island: - Resolution (pixels per m): 42.49 - Resolution (m per pixel): 0.02353 2. Chatham Island: - Resolution (pixels per m): 17.68 - Resolution (m per pixel): 0.05656 |
|
| Lineage/Methodology: | Location This data contains point annotations for five mass stranding events on remote islands in Tasmania and New Zealand, between 2018 and 2023. The events comprise: (1) a LFPW mass stranding of 162 individual animals comprising two pods approximately 2 km apart, reported on the 24 November 2018 in Mason Bay, Stewart Island/Rakiura, New Zealand, (2) a sperm whale mass stranding of 14 individuals, reported on the 19 September 2022 on King Island, Tasmania, (3) (a) a LFPW mass stranding of 243 animals, reported 7 October 2022 on Maunganui Beach, Chatham Island/Wharekauri, New Zealand (b) a LFPW mass stranding of 136 animals, reported 3 October 2023 on Long Beach, Petre Bay, Chatham Island/Wharekauri, New Zealand, and (4) a LFPW mass stranding of 245 animals, reported 10 October 2022 in Waihere Bay, Pitt Island/Rangiauria, New Zealand. Satellite imagery and ground data Stranding events were selected based on opportunistic availability of satellite imagery within image archives, as well as actively tasking (ordering) new imagery during stranding events. Optical satellite images were collected at their sensor specific native spatial resolution (0.5 or 0.3 m) and were artificially down-sampled (reduced spatial resolution), and/or up-sampled (enhanced spatial resolution) by the satellite image provider using proprietary algorithms, to achieve, where possible, imagery at three spatial resolutions, 0.5, 0.3 and 0.15 m. Due to licencing agreements, the satellite imagery are not shared here, however, the full metadata of the satellite imagery associated with the data, including information necessary to access and reproduce the image pre-processing steps, are provided in supplemental S2 of the first associated manuscript and S3 of https://doi.org/10.1016/j.mex.2026.103949 respectively. Ground data are essential to validate satellite data in these early stages of method development. For the sperm whale event in Tasmania, ground counts were provided by the Department of Natural Resources and Environment, Tasmania. For all LFPW events, ground counts were provided by the New Zealand Department of Conservation and the Non-Governmental Organisation, Project Jonah. Aerial imagery for the Stewart and Chatham Island events, were provided by the Department of Conservation and a local photographer, capturing part of the event extent. Open-source pre-processing and image annotation workflow To facilitate reproducible analysis and annotation, an open-source workflow was developed using QGIS 3.28. The standardised protocol was adapted from Cubaynes et al. (2023). A template for recording annotation metadata (e.g., confidence scores, satellite metadata, feature information, and environmental conditions) and an accompanying training document was co-developed (S3_attribute_training_document.pdf, included here and available at https://doi.org/10.1016/j.mex.2026.103949). This approach ensures that future stranding datasets can be collected and formatted under consistent data standards. In addition, a replicable version of the QGIS workflow was developed using Python 3.10 (https://github.com/PennyJClarke/strandings_from_space). The pre-processing steps applied to the optical satellite imagery was pansharpening, using a QGIS plugin, Orfeo Toolbox (OTB, version 8.1.1. Win64, BundleToPerfectSensor tool, Bayes algorithm and bi-cubic interpolation, see supplemental S2 in the associated manuscript). The SAR image was georeferenced to the matching optical image (aligning both images) by assigning ground control points in the SAR image to known geolocations in the optical image, using the QGIS 'Georeferencer' tool in QGIS 3.16, (transformation type: polynomial 1 and resampling method: linear,see supplemental S3 and S4 from the associated manuscript for guidance). Manual annotation The dataset contains annotations for three observers, with a high level of expertise in reviewing satellite imagery for wildlife from space. Observers had the following target feature expertise; observer 1 - live and stranded whales, observer 2 - live whales and walruses, and observer 3 - albatross and walruses. Additionally, observer 1 and 2 had field experience (ground and aerial surveys) observing live and stranded cetaceans. Each observer independently reviewed the satellite imagery using the annotation workflow. Features were recorded with a georeferenced point and assigned a confidence score, 'definite_90-100', 'likely_70-89' and 'possible_50-69'. Observers were provided training for how to score features of interest in the imagery. For details of how scores are derived, see supplemental S3 in (see https://doi.org/10.1016/j.mex.2026.103949). Additionally, observer 1 annotated all features with the required and desirable attributes outlined in the standardised protocols (S3_attribute_training_document.pdf, included as part of the dataset), except for annotations made in SAR imagery due to lack of visual cues. As a final stage, the observers annotated the aerial imagery using the Visual Geometry Group (VGG) Image Annotator, as per protocols in Cubaynes et al. (2024). Please note: Do not open the .csv directly with excel as it will auto amend the date format from the standardised protocol ISO8601 format. Instead, to view the csv, open with notepad or notepad ++ (https://notepad-plus-plus.org/downloads/, free download). Else if loading .csv to QGIS, from the main toolbar select 'Layer' > 'Add Layer' >'Add Delimited Text Layer...' in the 'Data Source Manager' window that opens, select 'Delimited Text'' tab. After selecting the .csv file to import, in the 'Sample Data'' table at the bottom of the 'Data Source Manager' window, which provides a visual example of how the data will import, ensure to set the type below the column header for 'img_date'' and 'img_time' as 'abc Text (string)'. Semi-automated clustering of multi-observer annotations To compare observers' annotations, hierarchical clustering was performed using a python script (v3.12) adapted from Attard et al. (2025) (available at: https://github.com/PennyJClarke/strandings_from_space). The clustering algorithm applied Wards method to amass points within a user-defined distance threshold. To improve clustering accuracy, created clusters were further refined using a series of constraints. The median of all points within a cluster was calculated to represent the cluster location (predicted, .csv format). Clusters were reviewed to verify accuracy, and manually corrected (ground_truth, .csv format) if needed. For georeferenced satellite image annotations, clustering performed best when georeferenced coordinates (latitude, longitude) were converted into pixel-based coordinates within the image (row, col), using an affine transform. For aerial imagery with an unknown spatial extent, clustering was performed using pixel-based measurements generated by estimating real-world distances from average morphometrics calculated across several hundred male and female pilot whales sampled from New Zealand strandings (Emma Betty, 2022; Emma Betty, 2023), visible in the image, using ImageJ (version 1.54g; Java 1.8.0_345 [64-bit], see supplemental S5 for methodology in associated manuscript (doi link)). The additional datasets necessary to reproduce this data can be accessed from: -Vantor: VHR optical satellite imagery for sensor GeoEYE-1, WorldView-2 (WV2) and Worldview-3 (WV3) -Airbus/DLR: SAR satellite imagery for sensor TerraSAR-X -Sentinel Hub or Airbus: VHR optical satellite imagery for sensor Pleiades -Department of Conservation New Zealand: Aerial image of the Stewart Island mass stranding and ground counts of the Stewart, Chatham, and Pitt Island stranding events -The Photographic Wanderings of Tamzin S Henderson (https://www.facebook.com/profile/100063641619052/search/?q=stranding): Aerial image of the Chatham island mass stranding event -Kris Carlyon, Marine Conservation Program, Department of Natural Resources and Environment Tasmania (NRE Tas), Tasmanian Government, Australia: Ground counts of the King Island sperm whale mass stranding event. Guidance on how to pre-process and annotate satellite imagery are available in supplemental of (Ecological Informatics doi ) and, S3 and S4 in https://doi.org/10.1016/j.mex.2026.103949. For the most up to date versions of any supplemental guidance or to access the replicable python pre-processing and image annotation pipeline, please visit https://github.com/PennyJClarke/strandings_from_space. For guidance on annotating aerial imagery using VGG, review guidance in Cubaynes et al. (2024). Additionally for guidance on generating pixel-based measurements necessary for clustering annotations made in unknown reference aerial imagery, please see supplemental S5 in the associated manuscript. The VHR satellite imagery associated with the annotations here, were captured for mass stranding events along remote New Zealand offshore islands and Tasmanian coastlines between 2018-2023. Full details of the event dates and imagery can be found in supplemental S2 of the associated manuscript. |
|
Extent
| Temporal Coverage: | |
|---|---|
| Start Date | 2018-11-24 |
| End Date | 2023-10-03 |
| Spatial Coverage: | |
| Latitude | |
| Southernmost | -47 |
| Northernmost | -46.9 |
| Longitude | |
| Westernmost | 167.6 |
| Easternmost | 167.8 |
| Altitude | |
| Min Altitude | N/A |
| Max Altitude | N/A |
| Depth | |
| Min Depth | N/A |
| Max Depth | N/A |
| Latitude | |
| Southernmost | -39.9 |
| Northernmost | -39.8 |
| Longitude | |
| Westernmost | 143.8 |
| Easternmost | 143.9 |
| Altitude | |
| Min Altitude | N/A |
| Max Altitude | N/A |
| Depth | |
| Min Depth | N/A |
| Max Depth | N/A |
| Latitude | |
| Southernmost | -43.9 |
| Northernmost | -43.7 |
| Longitude | |
| Westernmost | -176.9 |
| Easternmost | -176.6 |
| Altitude | |
| Min Altitude | N/A |
| Max Altitude | N/A |
| Depth | |
| Min Depth | N/A |
| Max Depth | N/A |
| Latitude | |
| Southernmost | -44.3 |
| Northernmost | -44.2 |
| Longitude | |
| Westernmost | -176.3 |
| Easternmost | -176.2 |
| Altitude | |
| Min Altitude | N/A |
| Max Altitude | N/A |
| Depth | |
| Min Depth | N/A |
| Max Depth | N/A |
| Location: | |
| Location | New Zealand |
| Detailed Location | Mason Bay, Stewart Island/Rakiura |
| Location | Australia |
| Detailed Location | King Island, Tasmania |
| Location | New Zealand |
| Detailed Location | Maunganui Beach, Chatham Island/Wharekauri |
| Location | New Zealand |
| Detailed Location | Long Beach, Petre Bay, Chatham Island/Wharekauri |
| Location | New Zealand |
| Detailed Location | Waihere Bay, Pitt Island/Rangiauria |
Instrumentation
| Data Collection: | Satellite imagery used to produce this dataset include the following sensors, optical: GeoEYE-1, WorldView-2 (WV2) and WorldView-3 (WV3), Pleiades, and SAR: TerraSAR-X. The technical specification of the drones used to collect the drone imagery annotated here are unknown. The ground counts (total counts) of stranded cetaceans at each event were provided by: - the Department of Conservation New Zealand - Project Jonah, New Zealand, and - the Marine Conservation Program, Department of Natural Resources and Environment Tasmania (NRE Tas), Tasmanian Government, Australia. The annotation of all satellite data was made using open source QGIS 3.28 Windows 10 (see supplemental S2 of (Ecological Informatics doi ) and, S3 and S4 in https://doi.org/10.1016/j.mex.2026.103949). As well as a user-interface centred pipeline in QGIS, a replicable version of the workflow is available using Python 3.10 (https://github.com/PennyJClarke/strandings_from_space). All satellite imagery was pansharpened using QGIS plugin Orfeo Toolbox (OTB, version 8.1.1. Win64, BundleToPerfectSensor tool, Bayes algorithm and bi-cubic interpolation) (see supplemental S2 for guidance in the associated manuscript (doi link)). The SAR image was georeferenced to the matching optical image (aligning both images) using the QGIS 'Georeferencer' tool in QGIS 3.16 (transformation type: polynomial 1 and resampling method: linear, see supplemental S3 and S4 for guidance in associated manuscript. The annotation of all aerial data was made using VGG Image Annotator, as per protocols in Cubaynes et al. (2024) To compare observers' annotations, hierarchical clustering was performed using a python script (v3.12) adapted from Attard et al. (2025) (available at: https://github.com/PennyJClarke/strandings_from_space). For aerial imagery with an unknown spatial extent, clustering was performed using pixel-based measurements generated by estimating real-world distances from average morphometrics calculated across several hundred male and female pilot whales sampled from New Zealand strandings (Emma Betty, 2022; Emma Betty, 2023), visible in the image, using ImageJ (version 1.54g; Java 1.8.0_345 [64-bit], see supplemental 5 for methodology in the associated manuscript). |
|---|
Storage
| Distribution: | |
|---|---|
| Distribution Media | Online Internet (HTTP) |
| Distribution Size | 7.3 MB |
| Distribution Format | ASCII |
| Fees | N/A |
| Data Storage: | This dataset consists of 125 Files, 12 Folders, 9.75 MB. The data have a specific structure that is required for running the code as detailed in https://github.com/PennyJClarke/strandings_from_space. Additionally, the data adheres to the data standard available at: https://doi.org/10.1016/j.mex.2026.103949 (S3_attribute_training_document.pdf, included here). The structure contains: * input files in .csv format, containing multi-observer point annotations derived from satellite imagery, as well as multi-observer reference points derived from aerial data, and reference total counts from ground observations made by local stranding response teams. The file naming convention e.g., chatham-island_geoeye1_105001002F60AA00_20221014_213434_50_1.csv is equal to 'location_satellite_image-catalog-id_image-date(YYYYMMDD)_image-time(HHMMSS)_spatial-resolution_observer-id'. * temp_input files in .csv format, containing multi-observer point annotations derived from satellite imagery, concatenated into a single .csv file with an assigned unique id * output files in .csv format, containing cluster points, which indicate the median location of stranded cetaceans of all multi-observer point annotations derived from satellite imagery, calculated through pixel coordinates and converted to geographic coordinates. Please note, due to the semi-automated clustering methods attempt to improve clustering accuracy by further refining clusters through a series of constraints, the cluster location may not always be precisely positioned on or central to the stranded cetacean feature. The file naming convention is the satellite or aerial count input .csv filename ending with '_predicted.csv' or '_ ground_truth.csv'. '_predicted.csv' contains the semi-automated clustering outputs and '_ ground_truth.csv' contains the clusters once they were manually reviewed to verify accuracy and corrected. * summary files in .csv format, which summarises attributes from the 'clusters''' output, e.g., the 'label_count' (number of observers who annotated a feature) and the 'majority_label' (majority certainty of observers for an annotated feature), alongside the required and desirable attributes of that feature made by observer one (in 'satellite' > 'counts'). Where observer one did not identify an annotation in a cluster, no desirable attributes are detailed. The latitude and longitude have been manually corrected to address erroneous point placement of the semi-automated clustering and provided in a projected and geographic coordinate system (EPSG:4326). This dataset is intended for use in machine learning models, where users can extract and use only observations made by two or more observers with 'definite_90-100' and 'likely_70-89' certainty. The file naming convention is the satellite input .csv filename ending with '_summary.csv''. For more details on the fields contained in the dataset attribute table see ''readme_clusters.txt' and 'S3_attribute_training_document.pdf'. Please note, due to licencing agreements, the satellite imagery are not supplied as part of this dataset. The full metadata of the satellite imagery associated with the data, including information necessary to access and reproduce the image pre-processing steps, are provided in supplemental S2 of (Ecological Informatics doi ) and, S3 in https://doi.org/10.1016/j.mex.2026.103949. |