NERACOOS: Continuous Plankton Recorder Data Support

Integrating Continuous Plankton Recorder data into NERACOOS & ERDDAP

This repository contains the necessary code and documentation to support hosting the Gulf of Maine Continuous Plankton Recorder Survey Data on ERDDAP.

Organization:

This repository documents the data provenance for continuous plankton recorder data obtained from a number of scientific research agencies (NOAA and MAB), and covering different sampling transects (The Gulf of Maine & The Mid-Atlantic Bight Transects).

Raw data from all sources is contained in the data_raw/ directory. Code that prepares the raw data for ERDDAP and any necessary documentation is specific to the source that the data was received from. This information can be found in the following sub-folders:

Sub-Folder	Description
GulfOfMaine_NOAA	Gulf of Maine CPR Data obtained from NOAA
GulfOfMaine_MBA	Gulf of Maine CPR Data Obtained from MBA
MidAtlantic_NOAA	Mid-Atlantic Bight CPR Data Obtained from NOAA
MidAtlantic_MBA	Mid-Atlantic Bight CPR Data Obtained from MBA

These resources have been processed independently due to differences in measurement units and organization structures. Documentation on how each dataset was received and treated prior to uploading into ERDDAP is documented within each of the corresponding sub-folders.

Reproducing the Data Transformations

The full processing pipeline from the raw data to their final ERDDAP formats has been implemented using the {targets} R-package, and can be recreated in full by running the following code in an active R session. (Assuming all R-packages are installed).

library(targets)
tar_make()

This will recreate the processing steps outlined in _targets.R that transform the raw files into the format uploaded onto ERDDAP:

The DAG above shows a simplified representation of the steps for the NOAA Continuous Plankton Recorder Survey’s Zooplankton data, where the taxonomic information found in the header is separated from the abundance information and later joined back after it has been reshaped. Similar cleanup paths exist for the data obtained from NOAA as well as the data obtained from the MBA.

Abundance Unit Differences

Due to how the CPR data is stored and maintained within these two institutions, conversions to a standard unit of measurement is necessary when working with CPR jointly from both sources.

Taxon Naming Differences

In addition to unit conversions, there are taxonomic and development stages that are recorded inconsistently across the two data sources and used inconsistently through time. Working across the data sources requires additional data-wrangling which is accomplished with the use of a key for transitioning to more coarse development stage groupings.

Information on resolving the differences between these two data resources can be found in the following sub folder: Full_Timeseries_Workup/, with examples of code working from ERDDAP as a starting point.

Full Gulf of Maine Timeseries

For those interested in working with a complete timeseries, we have made one available following minor data wrangling changes to the original datasets.

Access to the complete timeseries can be done via ERDDAP here: NERACOOS ERDDAP

Or using software packages like {rerddap} for R or {erdappy} for access using python:

# Package to interface with ERDDAP
library(rerddap)

# 1. Zooplankton
# Get the tabledap information from the server link and dataset_id
cpr_info <- info(url = "http://ismn.erddap.neracoos.org/erddap", 
                 datasetid = "gom_cpr_zooplankton_full")

# Use the tabledap function to import all the data (optionally add filters)
gom_zp <- tabledap(cpr_info)


# 2. Phytoplankton
# Get the tabledap information from the server link and dataset_id
cpr_info <- info(url = "http://ismn.erddap.neracoos.org/erddap", 
                 datasetid = "gom_cpr_phytoplankton_full")

# Use the tabledap function to import all the data (optionally add filters)
gom_phyto <- tabledap(cpr_info)

Details on how the complete timeseries was generated, with code and notes on data wrangling decisions can be found here: www.github.com/gulfofmaine/neracoos_cpr_data/Full_Timeseries_Workup

Project Funding:

Funding for making these resources available was provided through grant awards from the National Science Foundation and from the Lenfest Ocean Program. With communication and support from the Northeast Fisheries Science Center and the Marine Biological Association.

Additionals Resources (Under Development):

Whenever working with different datasets, or differently managed versions of the same data, it is common to have to perform data reshaping steps in order to join across resources. CPR data made available via ERDDAP is no different. Below are a few common processing workflows that a user of this data may find helpful:

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
Full_Timeseries_Workup		Full_Timeseries_Workup
GulfOfMaine_MBA		GulfOfMaine_MBA
GulfOfMaine_NOAA		GulfOfMaine_NOAA
MidAtlantic_NOAA		MidAtlantic_NOAA
README_files/figure-gfm		README_files/figure-gfm
data_raw		data_raw
erddap_xml		erddap_xml
man/figures		man/figures
refs		refs
targets_R		targets_R
.gitignore		.gitignore
NERACOOS_CPR_DATA.Rproj		NERACOOS_CPR_DATA.Rproj
README.Rmd		README.Rmd
README.md		README.md
_targets.R		_targets.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NERACOOS: Continuous Plankton Recorder Data Support

Organization:

Reproducing the Data Transformations

Abundance Unit Differences

Taxon Naming Differences

Full Gulf of Maine Timeseries

Project Funding:

Additionals Resources (Under Development):

About

Releases

Packages

Contributors 3

Languages

gulfofmaine/NERACOOS_CPR_DATA

Folders and files

Latest commit

History

Repository files navigation

NERACOOS: Continuous Plankton Recorder Data Support

Organization:

Reproducing the Data Transformations

Abundance Unit Differences

Taxon Naming Differences

Full Gulf of Maine Timeseries

Project Funding:

Additionals Resources (Under Development):

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages