-
Notifications
You must be signed in to change notification settings - Fork 24
Orca training data
There are many recordings of killer whales available, but relative to other marine mammal species, there is a paucity of labeled data. For example, many toothed whale (Odontocete) species are included in the Mobysound archive, but not yet killer whales (as of July, 2020).
This page documents the growing array of labeled data specific to killer whale ecotypes, with a primary focus on Southern Resident Killer Whales, and a secondary focus on other ecotypes of the Northeast Pacific Ocean. Open data sources, including those provided by Orcasound member organizations, are listed first to promote collaboration. Closed data sources are listed in the hope that they become available to the open-source and open-data communities in the future, or are otherwise valuable as reference points.
Note: these data are for training models. For test data, please refer to the orca test data wiki page.
This section contains Orcasound data sets aimed at training machine learning models that detect or classify the signals of killer whales. The primary focus is on binary classification of any Southern Resident Killer Whales SRKW calls (yes/no), but labels may also indicate call type, or whistles, or clicks, and there are also some resources related to Bigg's or transient killer whales.
NOTE: To access these data you cannot use a browser. Instead note the URL and use the AWS Command Line Interface in a terminal window to access the public files. See the Data access via AWS CLI page to learn more about the AWS Command Line Interface. Many of these data are aggregated within the Orcasound "Acoustic Sandbox" (a public S3 bucket).
-
SRKW training data (in open-access Orcasound archives)
- Sept 27, 2017 (24 hrs of continuous 5-min WAV files, with 4 hours containing SRKW signals)
- July 05, 2019 (2 hrs of continuous 0.5-hr WAV files containing SRKW signals)
- July 05, 2019 (S3 objects containing ~1 hr of HLS data each with SRKW signals)
- To do: list them here...
- Southern Resident Killer Whale (SRKW) call catalog (John Ford, 1989; supplemented by Rich Osborne)
- Orcasound archive w/o human narration
- Github repository archive FLAC, ogg, mp3 formats
-
Bigg's (transient) killer whale training data:
- Un-labeled
- Labeled
- 2081 Dec 02: Bigg's & humpback nighttime recording with labels; ~1hr lossy recording with labels that may need to be simplified/standardized for training of a binary classifier...
- 2018 Dec 07: blog post with metadata and link to labeled raw data in .wav format; raw data with labels -- ~1 hour mp3 format, labeled in Audacity by Scott Veirs, labels may need to be simplified/standardized for training of a binary classifier...
-
Prospects for additional Orcasound training (& testing) data:
- Pod.Cast candidates for additional rounds of annotation
- Current listener log
-
General KW ecotypes
- Watkins Marine Mammal Library (WHOI), global killer whale ecotypes (1960-1993)
- Labeled data from the Watkins Marine Mammal Library's killer whale tapes (see Podcast round 1
- SRKWs
- Orca Behavior Institute (Monika Wieland), historic data from cabled Lime Kiln State Park and some
- NOAA (Marla Holt, Candice Emmons), mostly autonomous recorders on outer coast WA
- ONC (Kristen Kanes, Science open data set in 2020?), cabled arrays on outer BC shelf (Barkley Canyon; and Georgia Strait? Early versions were not specific to ecotype?)
- DFO (James Pilkington), mostly autonomous recorders on outer coast BC (mostly clips? may be specific to ecotype)
- SMRU/TWM (Jason Wood), some labeled by Alex Harris (30,000 general KWs; 30,000 non-KWs)
- JASCO (David Hannay? Ruth Joy?), 5 second clips
- NRKWs
- OrcaLab (Paul Spong, Helena Symonds), cabled near-shore hydrophones in Johnstone Strait, B.C.
- Orchive (data archive by Steve Ness at UVic)
- OrcaSPOT (Bergler et al. ML effort published in 2019)
- OrcaSPOT repo (Python code)
- OrcaSPOT publication (2019)
- [Pacific Wild unlabeled archive](https://soundcloud.com/pacificwild0 (Soundcloud), cabled near-shore hydrophones in central B.C., near Bella Bella.
- OrcaLab (Paul Spong, Helena Symonds), cabled near-shore hydrophones in Johnstone Strait, B.C.
- Alaska residents
- OrcaCNN (Dan Olsen), autonomous recorders with signals from KWs (also Bigg's?)
- Bigg's (transients)
- No labeled data (to our knowledge)
- Raw data sources:
- U.S. Navy recording of transients in Dabob Bay (2005, ~42 minutes of vocalization, echolocation, percussives; mp3 format from original AIFF...)
- John Ford contribution to Orcasound open-access data project of transient call types (T1,3,7,8) (Recorded by F. Thomsen on August 25 1996 near Numas I., Queen Charlotte Strait with many calls from T014, T015)
- Alaskan transients (via Dan Olsen and Hannah Myers)
- Many recordings of AT1s (only 7 individuals left; unique sounding calls relative to other transients)
- Gulf of Alaska transients (need to be digitized)