Skip to content
Scott Veirs edited this page Oct 5, 2021 · 22 revisions

There are many recordings of killer whales available, but relative to other marine mammal species, there is a paucity of labeled data. For example, many toothed whale (Odontocete) species are included in the Mobysound archive, but not yet southern resident killer whales (as of July, 2020).

This page documents the growing array of labeled data specific to killer whale ecotypes, with a primary focus on Southern Resident Killer Whales, and a secondary focus on other ecotypes of the Northeast Pacific Ocean. Open data sources, including those provided by Orcasound member organizations, are listed first to promote collaboration. Closed data sources are listed in the hope that they become available to the open-source and open-data communities in the future, or are otherwise valuable as reference points.

Note: these data are for training models. For test data, please refer to the orca test data wiki page.

Open data sources

Orcasound data

This section contains Orcasound data sets aimed at training machine learning models that detect or classify the signals of killer whales. The primary focus is on binary classification of any Southern Resident Killer Whales SRKW calls (yes/no), but labels may also indicate call type, or whistles, or clicks, and there are also some resources related to Bigg's or transient killer whales.

NOTE: To access these data you cannot use a browser. Instead note the URL and use the AWS Command Line Interface in a terminal window to access the public files. See the Data access via AWS CLI page to learn more about the AWS Command Line Interface. Many of these data are aggregated within the Orcasound "Acoustic Sandbox" (a public S3 bucket).

Other open labeled data sources

Closed or restricted data sources

Non-Orcasound labeled data sources (not yet open, or licensing unclear)

  • SRKWs
    • Orca Behavior Institute (Monika Wieland), historic data from cabled Lime Kiln State Park
    • NOAA (Brad Hanson, Marla Holt, Candice Emmons): autonomous recorders on outer coast WA and DTAG deployments on SRKWs
    • ONC (Kristen Kanes, Science open data set in 2020?), cabled arrays on outer BC shelf (Barkley Canyon; and Georgia Strait? Early versions were not specific to ecotype?)
    • DFO (James Pilkington), mostly autonomous recorders on outer coast BC (mostly clips? may be specific to ecotype)
    • SMRU/TWM (Jason Wood), some labeled by Alex Harris (30,000 general KWs; 30,000 non-KWs)
    • JASCO (David Hannay? Ruth Joy?), 5 second clips
  • NRKWs
    • OrcaLab (Paul Spong, Helena Symonds), cabled near-shore hydrophones in Johnstone Strait, B.C.
    • [Pacific Wild unlabeled archive](https://soundcloud.com/pacificwild0 (Soundcloud), cabled near-shore hydrophones in central B.C., near Bella Bella.
  • Alaska residents
    • OrcaCNN (Dan Olsen), many recordings from autonomous and boat-based hydrophone recording systems
  • Bigg's (transients)
    • No labeled data (to our knowledge)
    • Raw data sources:
      • Alaskan transients (via Dan Olsen and Hannah Myers)
        • Many recordings of AT1s (only 7 individuals left; unique sounding calls relative to other transients)
        • Gulf of Alaska transients (need to be digitized)
  • California recordings (ecotype uncertain unless in a sub-section)
  • Antarctic ecotypes