Merge pull request #5 from ARM-DOE/master

pull request
ajsockol · May 11, 2021 · 976002b · 976002b
2 parents 776c89b + a7b4c67
commit 976002b
Show file tree

Hide file tree

Showing 96 changed files with 3,039 additions and 614 deletions.
diff --git a/.coveragerc b/.coveragerc
@@ -0,0 +1,2 @@
+[run]
+omit =./act/tests/*, ./act/*version*py
diff --git a/.travis.yml b/.travis.yml
@@ -8,22 +8,23 @@ env:
 
 matrix:
     include:
-    - python: 3.6
+    - python: 3.7
       env:
-        - PYTHON_VERSION="3.6"
+        - PYTHON_VERSION="3.7"
         - DOC_BUILD="true"     
-    - python: 3.7
+    - python: 3.8
       sudo: yes
       dist: xenial
       env:
-        - PYTHON_VERSION="3.7"
+        - PYTHON_VERSION="3.8"
         - DOC_BUILD="true"
 install:
     - source continuous_integration/install.sh
     - pip install pytest-cov
     - pip install coveralls
+    - pip install metpy
 script:
-    - eval xvfb-run pytest --cov=act/
+    - eval xvfb-run pytest --mpl --cov=act/ --cov-config=.coveragerc
     - flake8 --max-line-length=115 --ignore=F401,E402,W504,W605
 after_success:
     - coveralls

diff --git a/CREATING_ENVIRONMENTS.rst b/CREATING_ENVIRONMENTS.rst
@@ -74,7 +74,7 @@ do this step while the environment is activate::
 Another way to create a conda environment is by doing it from scratch using
 the conda create command. An example of this::
 
-        conda create -n act_env -c conda-forge python=3.7 numpy pandas astral
+        conda create -n act_env -c conda-forge python=3.7 numpy pandas
         scipy matplotlib dask xarray
 
 After activating the environment with::

diff --git a/MANIFEST.in b/MANIFEST.in
@@ -10,10 +10,13 @@ recursive-exclude * *.py[co]
 recursive-include act/plotting *.txt
 
 recursive-include docs *.rst conf.py Makefile make.bat
-recursive-include act/tests/data *.cdf *.nc *.data
+recursive-include act/tests/data *
+
 
 include versioneer.py
 include act/_version.py
 
+include act/utils/conf/de421.bsp
+
 # If including data files in the package, add them like:
 # include path/to/data_file
diff --git a/README.rst b/README.rst
@@ -4,7 +4,7 @@ Atmospheric data Community Toolkit (ACT)
 
 |AnacondaCloud| |Travis| |Coveralls| 
 
-|CondaDownloads| |Zenodo|
+|CondaDownloads| |Zenodo| |ARM|
 
 .. |AnacondaCloud| image:: https://anaconda.org/conda-forge/act-atmos/badges/version.svg
     :target: https://anaconda.org/conda-forge/act-atmos
@@ -21,11 +21,15 @@ Atmospheric data Community Toolkit (ACT)
 .. |Coveralls| image:: https://coveralls.io/repos/github/ARM-DOE/ACT/badge.svg
     :target: https://coveralls.io/github/ARM-DOE/ACT
 
+.. |ARM| image:: https://img.shields.io/badge/Sponsor-ARM-blue.svg?colorA=00c1de&colorB=00539c
+    :target: https://www.arm.gov/
 
 
-Python toolkit for working with atmospheric time-series datasets of varying dimensions.  The toolkit is meant to have functions for every part of the scientific process; discovery, IO, quality control, corrections, retrievals, visualization, and analysis.  This toolkit is meant to be a community platform for sharing code with the goal of reducing duplication of effort and better connecting the science community with programs such as the `Atmospheric Radiation Measurement (ARM) User Facility <http://www.arm.gov>`_.  Overarching development goals will be updated on a regular basis as part of the `Roadmap <https://github.com/AdamTheisen/ACT/blob/master/guides/ACT_Roadmap.pdf>`_.
+The Atmospheric data Community Toolkit (ACT) is an open source Python toolkit for working with atmospheric time-series datasets of varying dimensions.  The toolkit is meant to have functions for every part of the scientific process; discovery, IO, quality control, corrections, retrievals, visualization, and analysis.   It is meant to be a community platform for sharing code with the goal of reducing duplication of effort and better connecting the science community with programs such as the `Atmospheric Radiation Measurement (ARM) User Facility <http://www.arm.gov>`_.  Overarching development goals will be updated on a regular basis as part of the `Roadmap <https://github.com/AdamTheisen/ACT/blob/master/guides/ACT_Roadmap.pdf>`_  .
 
-* Free software: 3-clause BSD license
+|act|
+
+.. |act| image:: ./docs/source/act_plots.png
 
 Important Links
 ~~~~~~~~~~~~~~~
@@ -34,19 +38,22 @@ Important Links
 * Examples: https://arm-doe.github.io/ACT/source/auto_examples/index.html
 * Issue Tracker: https://github.com/ARM-DOE/ACT/issues
 
+Citing
+~~~~~~
+
+If you use ACT to prepare a publication, please cite the DOI listed in the badge above, which is updated with every version release to ensure that contributors get appropriate credit.  DOI is provided through Zenodo.
+
 Dependencies
 ~~~~~~~~~~~~
 
+* `xarray <https://xarray.pydata.org/en/stable/>`_
 * `NumPy <https://www.numpy.org/>`_
 * `SciPy <https://www.scipy.org/>`_
 * `matplotlib <https://matplotlib.org/>`_
-* `xarray <https://xarray.pydata.org/en/stable/>`_
-* `astral <https://astral.readthedocs.io/en/latest/>`_
+* `skyfield <https://rhodesmill.org/skyfield/>`_
 * `pandas <https://pandas.pydata.org/>`_
 * `dask <https://dask.org/>`_
 * `Pint <https://pint.readthedocs.io/en/0.9/>`_
-* `Cartopy <https://scitools.org.uk/cartopy/docs/latest/>`_
-* `Boto3 <https://aws.amazon.com/sdk-for-python/>`_
 * `PyProj <https://pyproj4.github.io/pyproj/stable/>`_
 * `Proj <https://proj.org/>`_
 * `Six <https://pypi.org/project/six/>`_
@@ -55,7 +62,9 @@ Dependencies
 Optional Dependencies
 ~~~~~~~~~~~~~~~~~~~~~
 
-* `MPL2NC <https://github.com/peterkuma/mpl2nc>`_ For reading binary MPL data.
+* `MPL2NC <https://github.com/peterkuma/mpl2nc>`_ Reading binary MPL data.
+* `Cartopy <https://scitools.org.uk/cartopy/docs/latest/>`_  Mapping and geoplots
+* `MetPy <https://unidata.github.io/MetPy/latest/index.html>`_ >= V1.0 Skew-T plotting and some stabilities indices calculations
 
 Installation
 ~~~~~~~~~~~~
@@ -138,7 +147,7 @@ Testing
 After installation, you can launch the test suite from outside the
 source directory (you will need to have pytest installed)::
 
-   $ pytest --pyargs act
+   $ pytest --mpl --pyargs act
 
 In-place installs can be tested using the `pytest` command from within
 the source directory.
diff --git a/act/discovery/get_armfiles.py b/act/discovery/get_armfiles.py
@@ -36,6 +36,11 @@ def download_data(username, token, datastream,
         current working directory with the same name as *datastream* to place
         the files in.
 
+    Returns
+    -------
+    files : list
+        Returns list of files retrieved
+
     Notes
     -----
     This programmatic interface allows users to query and automate
@@ -107,7 +112,12 @@ def download_data(username, token, datastream,
         output_dir = os.path.join(os.getcwd(), datastream)
 
     # not testing, response is successful and files were returned
+    if response_body_json is None:
+        print("ARM Data Live Webservice does not appear to be functioning")
+        return []
+
     num_files = len(response_body_json["files"])
+    file_names = []
     if response_body_json["status"] == "success" and num_files > 0:
         for fname in response_body_json['files']:
             if time is not None:
@@ -125,6 +135,9 @@ def download_data(username, token, datastream,
             # create file and write bytes to file
             with open(output_file, 'wb') as open_bytes_file:
                 open_bytes_file.write(urlopen(save_data_url).read())
+            file_names.append(output_file)
     else:
         print("No files returned or url status error.\n"
               "Check datastream name, start, and end date.")
+
+    return file_names
diff --git a/act/io/__init__.py b/act/io/__init__.py
@@ -20,3 +20,4 @@
 from . import armfiles
 from . import csvfiles
 from . import mpl
+from . import noaagml
diff --git a/act/io/armfiles.py b/act/io/armfiles.py
@@ -14,8 +14,9 @@
 import numpy as np
 import urllib
 import json
-from enum import Flag, auto
 import copy
+import act.utils as utils
+import warnings
 
 
 def read_netcdf(filenames, concat_dim='time', return_None=False,
@@ -32,7 +33,7 @@ def read_netcdf(filenames, concat_dim='time', return_None=False,
         Name of file(s) to read.
     concat_dim : str
         Dimension to concatenate files along. Default value is 'time.'
-    return_none : bool, optional
+    return_None : bool, optional
         Catch IOError exception when file not found and return None.
         Default is False.
     combine : str
@@ -134,6 +135,7 @@ def read_netcdf(filenames, concat_dim='time', return_None=False,
                                 arm_ds[var_name].astype(desired_time_precision),
                                 arm_ds[var_name].attrs)})
                 arm_ds[var_name] = temp_ds[var_name]
+                temp_ds.close()
 
                 # If time_offset is in file try to convert base_time as well
                 if var_name == 'time_offset':
@@ -160,13 +162,17 @@ def read_netcdf(filenames, concat_dim='time', return_None=False,
             not np.issubdtype(arm_ds['time'].values.dtype, np.datetime64) and
             not type(arm_ds['time'].values[0]).__module__.startswith('cftime.')):
         # Use microsecond precision to create time since epoch. Then convert to datetime64
-        time = (arm_ds['base_time'].values * 1000000 +
-                arm_ds['time'].values * 1000000.).astype('datetime64[us]')
+        if arm_ds['base_time'].values == arm_ds['time_offset'].values[0]:
+            time = arm_ds['time_offset'].values
+        else:
+            time = (arm_ds['base_time'].values +
+                    arm_ds['time_offset'].values * 1000000.).astype('datetime64[us]')
         # Need to use a new Dataset creation to correctly index time for use with
         # .group and .resample methods in Xarray Datasets.
         temp_ds = xr.Dataset({'time': (arm_ds['time'].dims, time, arm_ds['time'].attrs)})
 
         arm_ds['time'] = temp_ds['time']
+        temp_ds.close()
         for att_name in ['units', 'ancillary_variables']:
             try:
                 del arm_ds['time'].attrs[att_name]
@@ -180,8 +186,17 @@ def read_netcdf(filenames, concat_dim='time', return_None=False,
     # Get file dates and times that were read in to the object
     filenames.sort()
     for f in filenames:
-        file_dates.append(f.split('.')[-3])
-        file_times.append(f.split('.')[-2])
+        # If Not ARM format, read in first time for infos
+        if len(f.split('/')[-1].split('.')) == 5:
+            file_dates.append(f.split('.')[-3])
+            file_times.append(f.split('.')[-2])
+        else:
+            if arm_ds['time'].size > 1:
+                dummy = arm_ds['time'].values[0]
+            else:
+                dummy = arm_ds['time'].values
+            file_dates.append(utils.numpy_to_arm_date(dummy))
+            file_times.append(utils.numpy_to_arm_date(dummy, returnTime=True))
 
     # Add attributes
     arm_ds.attrs['_file_dates'] = file_dates
@@ -266,7 +281,7 @@ def create_obj_from_arm_dod(proc, set_dims, version='', fill_value=-9999.,
 
     """
     # Set base url to get DOD information
-    base_url = 'https://pcm.arm.gov/pcmserver/dods/'
+    base_url = 'https://pcm.arm.gov/pcm/api/dods/'
 
     # Get data from DOD api
     with urllib.request.urlopen(base_url + proc) as url:
@@ -275,7 +290,9 @@ def create_obj_from_arm_dod(proc, set_dims, version='', fill_value=-9999.,
     # Check version numbers and alert if requested version in not available
     keys = list(data['versions'].keys())
     if version not in keys:
-        print(' '.join(['Version:', version, 'not available or not specified. Using Version:', keys[-1]]))
+        warnings.warn(' '.join(['Version:', version,
+                      'not available or not specified. Using Version:', keys[-1]]),
+                      UserWarning)
         version = keys[-1]
 
     # Create empty xarray dataset
@@ -351,9 +368,9 @@ def __init__(self, xarray_obj):
         self._obj = xarray_obj
 
     def write_netcdf(self, cleanup_global_atts=True, cleanup_qc_atts=True,
-                     join_char='__', make_copy=True,
+                     join_char='__', make_copy=True, cf_compliant=False,
                      delete_global_attrs=['qc_standards_version', 'qc_method', 'qc_comment'],
-                     FillValue=-9999, **kwargs):
+                     FillValue=-9999, cf_convention='CF-1.8', **kwargs):
         """
         This is a wrapper around Dataset.to_netcdf to clean up the Dataset before
         writing to disk. Some things are added to global attributes during ACT reading
@@ -372,11 +389,17 @@ def write_netcdf(self, cleanup_global_atts=True, cleanup_qc_atts=True,
             Will use a single space a delimeter between values and join_char to replace
             white space between words.
         join_char : str
-            The character sting to use for replacing white spaces between words.
+            The character sting to use for replacing white spaces between words when converting
+            a list of strings to single character string attributes.
         make_copy : boolean
             Make a copy before modifying Dataset to write. For large Datasets this
             may add processing time and memory. If modifying the Dataset is OK
             try setting to False.
+        cf_compliant : boolean
+            Option to output file with additional attributes to make file Climate & Forecast
+            complient. May require runing .clean.cleanup() method on the object to fix other
+            issues first. This does the best it can but it may not be truely complient. You
+            should read the CF documents and try to make complient before writing to file.
         delete_global_attrs : list
             Optional global attributes to be deleted. Defaults to some standard
             QC attributes that are not needed. Can add more or set to None to not
@@ -387,6 +410,8 @@ def write_netcdf(self, cleanup_global_atts=True, cleanup_qc_atts=True,
             so not a perfect fix. Set to None to leave Xarray to do what it wants.
             Set to a value to be the value used as _FillValue in the file and data
             array. This should then remove missing_value attribute from the file as well.
+        cf_convention : str
+            The Climate and Forecast convention string to add to Conventions attribute.
         **kwargs : keywords
             Keywords to pass through to Dataset.to_netcdf()
 
@@ -447,4 +472,96 @@ def write_netcdf(self, cleanup_global_atts=True, cleanup_qc_atts=True,
                 except KeyError:
                     pass
 
+        # If requested update global attributes and variables attributes for required
+        # CF attributes.
+        if cf_compliant:
+            # Get variable names and standard name for each variable
+            var_names = list(write_obj.keys())
+            standard_names = []
+            for var_name in var_names:
+                try:
+                    standard_names.append(write_obj[var_name].attrs['standard_name'])
+                except KeyError:
+                    standard_names.append(None)
+
+            # Check if time varible has axis and standard_name attribute
+            coord_name = 'time'
+            try:
+                write_obj[coord_name].attrs['axis']
+            except KeyError:
+                try:
+                    write_obj[coord_name].attrs['axis'] = 'T'
+                except KeyError:
+                    pass
+
+            try:
+                write_obj[coord_name].attrs['standard_name']
+            except KeyError:
+                try:
+                    write_obj[coord_name].attrs['standard_name'] = 'time'
+                except KeyError:
+                    pass
+
+            # Try to determine type of dataset by coordinate dimention named time
+            # and other factors
+            try:
+                write_obj.attrs['FeatureType']
+            except KeyError:
+                dim_names = list(write_obj.dims)
+                FeatureType = None
+                if dim_names == ['time']:
+                    FeatureType = "timeSeries"
+                elif len(dim_names) == 2 and 'time' in dim_names and 'bound' in dim_names:
+                    FeatureType = "timeSeries"
+                elif len(dim_names) >= 2 and 'time' in dim_names:
+                    for var_name in var_names:
+                        dims = list(write_obj[var_name].dims)
+                        if len(dims) == 2 and 'time' in dims:
+                            prof_dim = list(set(dims) - set(['time']))[0]
+                            if write_obj[prof_dim].values.size > 2:
+                                FeatureType = "timeSeriesProfile"
+                                break
+
+                if FeatureType is not None:
+                    write_obj.attrs['FeatureType'] = FeatureType
+
+            # Add axis and positive attributes to variables with standard_name
+            # equal to 'altitude'
+            alt_variables = [var_names[ii] for ii, sn in enumerate(standard_names) if sn == 'altitude']
+            for var_name in alt_variables:
+                try:
+                    write_obj[var_name].attrs['axis']
+                except KeyError:
+                    write_obj[var_name].attrs['axis'] = 'Z'
+
+                try:
+                    write_obj[var_name].attrs['positive']
+                except KeyError:
+                    write_obj[var_name].attrs['positive'] = 'up'
+
+            # Check if the Conventions global attribute lists the CF convention
+            try:
+                Conventions = write_obj.attrs['Conventions']
+                Conventions = Conventions.split()
+                cf_listed = False
+                for ii in Conventions:
+                    if ii.startswith('CF-'):
+                        cf_listed = True
+                        break
+                if not cf_listed:
+                    Conventions.append(cf_convention)
+                write_obj.attrs['Conventions'] = ' '.join(Conventions)
+
+            except KeyError:
+                write_obj.attrs['Conventions'] = str(cf_convention)
+
+            # Reorder global attributes to ensure history is last
+            try:
+                global_attrs = write_obj.attrs
+                history = copy.copy(global_attrs['history'])
+                del global_attrs['history']
+                global_attrs['history'] = history
+            except KeyError:
+                pass
+
         write_obj.to_netcdf(encoding=encoding, **kwargs)