From cbe2cce4eecd39f7527c2e7af9527e4e058a1bea Mon Sep 17 00:00:00 2001 From: Gary Pavlis Date: Wed, 15 May 2024 14:36:48 -0400 Subject: [PATCH] Edit user manual (#530) * Fix hyperlink errors caused by picking missing space character * Fix a ton of improperly formatted py class/meth/func link lines --- .../user_manual/adapting_algorithms.rst | 6 +- docs/source/user_manual/continuous_data.rst | 42 +++++----- .../data_object_design_concepts.rst | 30 +++---- docs/source/user_manual/database_concepts.rst | 6 +- docs/source/user_manual/graphics.rst | 34 ++++---- docs/source/user_manual/handling_errors.rst | 65 +++++++-------- docs/source/user_manual/header_math.rst | 2 +- .../user_manual/importing_tabular_data.rst | 81 +++++++++++-------- .../source/user_manual/mongodb_and_mspass.rst | 4 +- docs/source/user_manual/obspy_interface.rst | 12 +-- .../user_manual/parallel_processing.rst | 28 +++---- 11 files changed, 164 insertions(+), 146 deletions(-) diff --git a/docs/source/user_manual/adapting_algorithms.rst b/docs/source/user_manual/adapting_algorithms.rst index 36d814f2b..c30f9a58b 100644 --- a/docs/source/user_manual/adapting_algorithms.rst +++ b/docs/source/user_manual/adapting_algorithms.rst @@ -153,7 +153,7 @@ objects from their ancestors (TimeSeries and ThreeComponentSeismogram). The current version of the implementations of these two algorithms can be found `here `__. -MsPASS uses the `pybind11 package` +MsPASS uses the `pybind11 package ` to bind C++ or C code for use by the python interpreter. For the present all C/C++ code is bound to a single module we call mspasspy.ccore. The details of the build system used in MsPASS are best discussed in a @@ -183,11 +183,11 @@ We note a few details about this block of code: 1. The :code:`m` symbol is defined earlier in this file as a tag for the module to which we aim to bind this function. It is defined earlier in the file with this construct: - + .. code-block:: c PYBIND11_MODULE(ccore,m) - + That is, this construct defines the symbol :code:`m` as an abstraction for the python module ccore. diff --git a/docs/source/user_manual/continuous_data.rst b/docs/source/user_manual/continuous_data.rst index 358b057ce..c5f67c24b 100644 --- a/docs/source/user_manual/continuous_data.rst +++ b/docs/source/user_manual/continuous_data.rst @@ -37,7 +37,7 @@ data are: two segments you need to merge have conflicting time stamps. To make this clear it is helpful to review two MsPASS concepts in the TimeSeries and Seismogram data objects. Let *d1* and *d2* be - two :code:`TimeSeries` objects that are successive segments we + two :py:class:`TimeSeries` objects that are successive segments we expect to merge with *d2* being the segment following *d1* in time. In MsPASS we use the attribute *t0* (alternatively the method *starttime*) for the time of sample 0. We also use the method @@ -64,10 +64,10 @@ covered by MsPASS. Gap Processing ~~~~~~~~~~~~~~~~~~ Internally MsPASS handles data gaps with a subclass of the -:code:`TimeSeries` called :code:`TimeSeriesWGaps`. That extension of -:code:`TimeSeries` is written in C++ and is documented +:py:class:`TimeSeries` called :py:class:`TimeSeriesWGaps``. That extension of +:py:class:`TimeSeries` is written in C++ and is documented `here `__. -Like :code:`TimeSeries` this class has python bindings created +Like :py:class:`TimeSeries` this class has python bindings created with pybind11. All the methods described in the C++ documentation page have python bindings. There are methods for defining gaps, zeroing data in defined gaps, and deleting gaps. @@ -81,12 +81,13 @@ Merging Data Segments ~~~~~~~~~~~~~~~~~~~~~~~~ There are currently two different methods in MsPASS to handle merging continuous data segments: (1) a special, implicit option of the -:py:meth:`mspasspy.db.database.Database.read_data` method of the -:py:class:`mspasspy.db.database.Database` class, and (2) the -processing function :py:func:`mspasspy.algorithms.window.merge`. +:py:meth:`read_data` method of the +:py:class:`Database` class, and (2) the +processing function :py:func:`merge`. In addition, there is a special reader function called -:py:func:`mspasspy.db.ensembles.TimeIntervalReader` that can be used -to read fixed time windows of data. That function uses :code:`merge` +:py:func:`TimeIntervalReader` that can be used +to read fixed time windows of data. That function uses +:py:func:`merge` to do gap and overlap repair. read_data merge algorithm @@ -101,8 +102,8 @@ waveform. If the station codes ("net", "sta", "chan", and "loc" attributes in all MsPASS schemas) change in a sequence of packets readers universally assume that is the end of a given segment. How readers handle a second issue is, however, variable. Each miniseed packet has a time -tag that is comparable to the `t0` attribute of a :class:`TimeSeries` object -and end time field equivalent to the output of the :class:`TimeSeries` +tag that is comparable to the `t0` attribute of a :py:class:`TimeSeries` object +and end time field equivalent to the output of the :py:class:`TimeSeries` endtime method. If the `t0` value of a packet is greater than some fractional tolerance of 1 sample more than the endtime of the previous packet, a reader will invoke a gap handler. A reader's gap handler @@ -112,7 +113,7 @@ handles this problem with their :class:`Stream` merge method described `here `__. That particular algorithm is invoked when reading miniseed data if and only if a block of data defined running the mspass -function :py:meth:`mspasspy.db.database.Database.index_mseed_file` is +function :py:meth:`index_mseed_file` is run with the optional argument `segment_time_tears` is set False. (Note the default is True.). If you need to use this approach, you will need to also take care in defining the value of the following arguments @@ -132,7 +133,7 @@ the obspy merge function noted above. The MsPASS function add some additional features and, although not verified by formal testing, is likely much faster than the obpsy version due to fundamental differences in the implementation. -The docstring for :py:func:`mspasspy.algorithms.window.merge` describes more +The docstring for :py:func:`merge` describes more details but some key features of this function are: - Like obspy's function of the same name its purpose is to glue/merge @@ -148,7 +149,7 @@ details but some key features of this function are: options for gap handling that are inseparable from the function. Any detected gaps in the MsPASS merge function are posted to the Metadata component of the - :class:`TimeSeries` it returns accessible with the key "gaps". + :py:class:`TimeSeries` it returns accessible with the key "gaps". The content of the "gaps" attribute is a list of one or more python dictionaries with the keyworks "starttime" and "endtime" defining the epoch time range of all gaps in the returned datum. @@ -167,29 +168,30 @@ details but some key features of this function are: a data quality problem that invalidates the data when the samples do not match. If you need the obspy functionality use the - :py:func:`mspasspy.util.converter.TimeSeriesEnsemble2Stream` and the - inverse :py:func:`mspasspy.util.converter.Trace2TimeSeriesEnsemble` + :py:func:`TimeSeriesEnsemble2Stream` and the + inverse :py:func:`Trace2TimeSeriesEnsemble` to create the obspy input and then restore the returned data to the MsPASS internal data structures TimeIntervalReader ----------------------- A second MsPASS tool for working with continuous data is a function -with the descriptive name :py:func:`mspasspy.db.ensembles.TimeIntervalReader`. +with the descriptive name +:py:func:`TimeIntervalReader`. It is designed to do the high-level task of cutting a fixed time interval of data from one or more channels of a continuous data archive. This function is built on top of the lower-level -:py:func:`mspasspy.algorithms.window.merge` but is best thought of as +:py:func:`merge` but is best thought of as an alternative reader to create ensembles cut from a continuous data archive. For that reason the required arguments are a database handle and the time interval of data to be extracted from the archive. Gap and overlap -handling is handled by :code:`merge`. +handling is handled by :py:func:`merge`. Examples ------------ *Example 1: Create a single waveform in a defined time window from continuous data archive.* -This script will create a longer :class:`TimeSeries` object from a set day files +This script will create a longer :py:class:`TimeSeries` object from a set day files for the BHZ channel of GSN station AAK. Ranges are constant for a simple illustration: diff --git a/docs/source/user_manual/data_object_design_concepts.rst b/docs/source/user_manual/data_object_design_concepts.rst index 4c132b254..60839e785 100644 --- a/docs/source/user_manual/data_object_design_concepts.rst +++ b/docs/source/user_manual/data_object_design_concepts.rst @@ -10,7 +10,7 @@ Overview atomic objects in seismology waveform processing:  scalar (i.e. single channel) signals, and three-component signals.   The versions of these you as a user should normally interact with are two objects defined in - MsPASS as :code:`TimeSeries` and :code:`Seismogram` respectively.   + MsPASS as :py:class:`TimeSeries` and :py:class:`Seismogram` respectively.   | These data objects were designed to simply interactions with MongoDB.  MongoDB is completely flexible in attribute names handled by the @@ -34,8 +34,9 @@ Overview generic ensemble.   A limitation of the current capability to link C++ binary code with python is that templates do not translate directly.   Consequently, the python interface uses two different names to define - Ensembles of TimeSeries and Seismogram objects:  :code:`TimeSeriesEnsemble` - and :code:`SeismogramEnsemble` respectively. + Ensembles of TimeSeries and Seismogram objects:  + :py:class:`TimeSeriesEnsemble` + and :py:class:`SeismogramEnsemble` respectively. | The C++ objects have wrappers for python that hide implementation details from the user.   All MongoDB operations implemented with the pymongo @@ -52,8 +53,7 @@ History developed by one of the authors (Pavlis) over a period of more than 15 years.   The original implementation was developed as a component of Antelope.  It was distributed via the open source additions to - Antelope distributed through the `Antelope user's - group + Antelope distributed through the `Antelope user's group `__ and referred to as SEISPP.   The bulk of the original code can be found `here `__ @@ -105,7 +105,7 @@ History :code:`Schema` object to reduce all Metadata to pure name:value pairs.  #. obspy does not handle three component data in a native way, but mixes - up the concepts we call :code:`Seismogram` and :code:`Ensemble` in to a common + up the concepts we call :py:class:`Seismogram` and :code:`Ensemble` in to a common python object they define as a `Stream `__.   We would argue our model is a more logical encapsulation of the @@ -113,7 +113,7 @@ History component data like a seismic reflection shot gather is a very different thing than a set of three component channels that define the output of three sensors at a common point in space. Hence, we carefully - separate :code:`TimeSeries` and :code:`Seismogram` (our name for Three-Component + separate :py:class:`TimeSeries` and :py:class:`Seismogram` (our name for Three-Component data). We further distinguish :code:`Ensembles` of each atomic type. Core Concepts @@ -141,7 +141,7 @@ Overview - Inheritance Relationships inheritance from three base classes:  :code:`BasicTimeSeries`, :code:`BasicMetadata`, and :code:`BasicProcessingHistory`.   Python supports multiple inheritance and the wrappers make dynamic casting within the hierarchy - (mostly) automatic.  e.g. a :code:`Seismogram` object can be passed directly to a + (mostly) automatic.  e.g. a :py:class:`Seismogram` object can be passed directly to a python function that does only Metadata operations and it will be handled seamlessly because python does not enforce type signatures on functions.  CoreTimeSeries and CoreSeismogram should be thought of a @@ -157,7 +157,7 @@ Overview - Inheritance Relationships completely different framework.  | We emphasize here that users should normally expect to only interact with - the :code:`TimeSeries` and :code:`Seismogram` objects. The lower levels sometimes + the :py:class:`TimeSeries` and :py:class:`Seismogram` objects. The lower levels sometimes but not always have python bindings. | The remainder of this section discusses the individual components in @@ -536,12 +536,12 @@ Scalar versus 3C data container.   | We handle three component data in MsPASS by using a matrix to store the data - for a given :code:`Seismogram`.   The data are directly accessible in C++ through a public + for a given :py:class:`Seismogram`.   The data are directly accessible in C++ through a public variable called u that is mnemonic for the standard symbol used in the old testament of seismology by Aki and Richards. In python we use the symbol :code:`data` for consistency with TimeSeries. There are two choices of the order of indices for this matrix.  - The MsPASS implementation makes this choice: a :code:`Seismogram` + The MsPASS implementation makes this choice: a :py:class:`Seismogram` defines index 0(1) as the channel number and index 1(2) as the time index.  The following python code section illustrates this more clearly than any words: @@ -645,7 +645,7 @@ Core versus Top-level Data Objects focus on the data structure they impose. Other sections expand on the details of both classes. -| Both :code:`TimeSeries` and :code:`Seismogram` objects extend their +| Both :py:class:`TimeSeries` and :py:class:`Seismogram` objects extend their "core" parents by adding two classes: #. :code:`ProcessingHistory`, as the name implies, can (optionally) store the @@ -678,9 +678,9 @@ Core versus Top-level Data Objects for Seismogram and TimeSeries, but it does not satisfy the basic rule of making a concept a base class if the child "is a" ErrorLogger. It does, however, perfectly satisfy the idea that the object "has an" - ErrorLogger. Both :code:`TimeSeries` and :code:`Seismogram` use the + ErrorLogger. Both :py:class:`TimeSeries` and :py:class:`Seismogram` use the symbol :code:`elog` as the name for the ErrorLogger object - (e.g. If *d* is a :code:`Seismogram` object, *d.elog*, would refer to + (e.g. If *d* is a :py:class:`Seismogram` object, *d.elog*, would refer to the error logger component of *d*.)'' Object Level History Design Concepts @@ -822,7 +822,7 @@ Error Logging Concepts   d.elog.log_error(d.job_id(),alg,err)   d.kill()   -| To understand the code above assume the symbol d is a :code:`Seismogram` +| To understand the code above assume the symbol d is a :py:class:`Seismogram` object with a singular transformation matrix created, for example, by incorrectly building the object with two redundant east-west components.   The rotate_to_standard method tries to compute a matrix diff --git a/docs/source/user_manual/database_concepts.rst b/docs/source/user_manual/database_concepts.rst index 40074b707..0942fdb07 100644 --- a/docs/source/user_manual/database_concepts.rst +++ b/docs/source/user_manual/database_concepts.rst @@ -279,7 +279,9 @@ With that background, there are two collections used to manage waveform data. They are called :code:`wf_TimeSeries` and :code:`wf_Seismogram`. These two collection are the primary work areas to assemble a working data set. We elected to keep data describing each of the two atomic data types in MsPASS, -:code:`TimeSeries` and :code:`Seismogram`, in two different collections. The +:py:class:`TimeSeries` +and :py:class:`Seismogram`, +in two different collections. The main reason we made the decision to create two collections instead of one is that there are some minor differences in the Metadata that would create inefficiencies if we mixed the two data types in one place. @@ -839,7 +841,7 @@ imports data through a two step procedure: function builds only an index of the given file and writes the index to a special collection called :code:`wf_miniseed`. -2. The same data can be loaded into memory as a MsPASS :code:`TimeSeriesEnsemble` +2. The same data can be loaded into memory as a MsPASS :py:class:`TimeSeriesEnsemble` object using the related function with this signature: .. code-block:: python diff --git a/docs/source/user_manual/graphics.rst b/docs/source/user_manual/graphics.rst index a34f79b8c..f7085d03c 100644 --- a/docs/source/user_manual/graphics.rst +++ b/docs/source/user_manual/graphics.rst @@ -22,12 +22,12 @@ with finite resources. The current support for graphics has three component. #. The lowest level support is to use the commonly used package - called `matplotlib`__. Because the + called `matplotlib `__. Because the sample arrays of all seismic objects act like numpy arrays that is often the simplest mechanism to make a quick plot. The basics of that approach are described below. -#. We have plotting classes called :code:`SeismicPlotter` - and :code:`Sectionplotter` to plot our native data types. +#. We have plotting classes called :py:class:`SeismicPlotter` + and :py:class:`SectionPlotter` to plot our native data types. #. As noted elsewhere a core component of MsPASS are fast conversions routines to and from obspy's native data types (:code:`Trace` and :code:`Stream`). That is relevant because obspy's native data types have integrated @@ -38,12 +38,12 @@ The current support for graphics has three component. Matplotlib graphics ~~~~~~~~~~~~~~~~~~~~ Because data vectors of -:py:func:`mspasspy.ccore.seismic.TimeSeries` and -:py:func:`mspasspy.ccore.seismic.Seismogram` objects +:py:class:`TimeSeries` and +:py:class:`Seismogram` objects act like numpy arrays the symbol defining the data vector can be passed directly to matplotlib low-level plotters. Here, for example, is a code fragment that would plot the -data in a :py:func:`mspasspy.ccore.seismic.TimeSeries` object +data in a :py:class:`TimeSeries` object with a simple wiggle plot and a time axis with 0 defined as start time. .. code-block:: python @@ -58,7 +58,7 @@ with a simple wiggle plot and a time axis with 0 defined as start time. plt.plot(t,d_shifted.data) plt.show() -Similarly, one way to plot a :py:func:`mspasspy.ccore.seismic.Seismogram` +Similarly, one way to plot a :py:class:`mspasspy.ccore.seismic.Seismogram` object is the following with subplots: .. code-block:: python @@ -81,8 +81,8 @@ A few comments about these examples: starttime. Without that step the time axis would be useless as it would be in epoch times, which are huge numbers. #. Both define the time axis manually using a loop and the - :code:`time` method common to both :code:`TimeSeries` and - :code:`Seismogram`. We have considered adding a + :code:`time` method common to both :py:class:`TimeSeries` and + :py:class:`Seismogram`. We have considered adding a :code:`time_axis` method to the API, but the example shows it would be so trivial we viewed it unnecessary baggage for the API. #. Note there are many options in matplotlib that could be used to @@ -97,19 +97,19 @@ Native Graphics The goal of the graphics module in MsPASS was to provide simple tools to plot native data types. We thousands first remind the user what is considered "native data" in MsPASS. They are: -(1) :code:`TimeSeries` objects are scalar, uniformly sampled seismic -seismic signals (a single channel), (2) :code:`Seismogram` objects are -bundled three-component seismic data, and (3) :code:`TimeSeriesEnsemble` and +(1) :py:class:`TimeSeries` objects are scalar, uniformly sampled seismic +seismic signals (a single channel), (2) :py:class:`Seismogram` objects are +bundled three-component seismic data, and (3) :py:class:`TimeSeriesEnsemble` and :code:`SeismogramEnsemble` objects are logical groupings of the two "atomic" objects in their names. The second issue is what types of plots are most essential? Our core graphics support two plot conventions: -1. :code:`SeismicPlotter` plots data in the standard convention used to plot - nearly all earthquake data. :code:`SeismicPlotter` plots data with +1. :py:class:`SeismicPlotter` plots data in the standard convention used to plot + nearly all earthquake data. :py:class:`SeismicPlotter` plots data with time as the x (horizontal axis). -2. :code:`SectionPlotter` plots data in the standard convention for seismic +2. :py:class:`SectionPlotter` plots data in the standard convention for seismic reflection data. Because with seismic reflection data normal moveout corrected time is a proxy for depth it is universal to plot time as the y axis (vertical) and running backward from the normal @@ -209,9 +209,9 @@ Finally, we would note that the plotters automatically handle switching to plot all the standard MsPASS data objects. Some implementation details we note are: -1. :code:`TimeSeries` data generate one plot frame with a time axis and +1. :py:class:`TimeSeries` data generate one plot frame with a time axis and a y axis of amplitude. -2. :code:`Seismogram` data are displayed on one plot frame. The three +2. :py:class:`Seismogram` data are displayed on one plot frame. The three components are plotted at equal y intervals in SeismicPlotter (equal x intervals in SectionPlotter) with the x1, x2, x3 components arranged from the bottom up (left to right for SectionPlotter). There is an option diff --git a/docs/source/user_manual/handling_errors.rst b/docs/source/user_manual/handling_errors.rst index f2aa963aa..e9ca99002 100644 --- a/docs/source/user_manual/handling_errors.rst +++ b/docs/source/user_manual/handling_errors.rst @@ -97,7 +97,7 @@ wrappers using a package called pybind11.) All have a C++ class called `ErrorLogger <../_static/html/classmspass_1_1utility_1_1_error_logger.html>`__ as a public attribute we define with the symbol :code:`elog`. (The python bindings are also defined in this link: -:py:class:`mspasspy.ccore.utility.ErrorLogger`. ) +:py:class:`ErrorLogger`. ) That mechanism allows processing functions to handle all exceptions without aborting through constructs similar to the following pythonic pseudocode: @@ -116,10 +116,10 @@ through constructs similar to the following pythonic pseudocode: The kill method is described further in the next section. The key point is that generic error handlers catch any exceptions and post message to -the :py:class:`mspasspy.ccore.utility.ErrorLogger` +the :py:class:`ErrorLogger` carried with the data in a container with the symbolic name elog. An error posted to the -:py:class:`mspasspy.ccore.utility.ErrorLogger` +:py:class:`ErrorLogger` always contains two components: (1) a (hopefully informative) string that describes the error, and (2) a severity tag. The class description @@ -214,45 +214,46 @@ Handling dead data has two important, practical constraints: How we address these two constraints is described in two sections below. The first is handled automatically by the -:py:meth:`mspasspy.db.database.Database.save_data` method of -:py:class:`mspasspy.db.database.Database`. The second has +:py:meth:`save_data` method of +:py:class:`Database`. The second has options that are implemented as methods of the class -:py:class:`mspasspy.db.util.Undertaker.Undertaker` that is the +:py:class:`Undertaker` that is the topic of the second subsection below. A final point is that if a job is expected to kill a large fraction of data there is a point where it becomes more efficient to clear the dataset of dead data. That needs to be done with some care if one wishes to preserve error log entries that document why a datum was killed. The -:code:`Undertaker` class, which described in the next section was designed +:py:class:`Undertaker` +class, which described in the next section was designed to handle such issues. Database handling of dead data --------------------------------- The standard way in MsPASS to preserve a record of killed data is implicit when the data are saved via the Database method -:py:meth:`mspasspy.db.database.Database.save_data`. -The :py:class:`mspasspy.db.database.Database` class internally +:py:meth:`save_data`. +The :py:class:`Database` class internally creates an instance of -:py:class:`mspasspy.util.Undertaker.Undertaker` +:py:class:`Undertaker` (Described in more detail the next section and the docstring viewable via the above link.) that handles the dead data during the save operation. The -:py:meth:`mspasspy.db.database.Database.save_data` +:py:meth:`save_data` method has these features: #. If an atomic datum is marked dead, - :py:meth:`mspasspy.db.database.Database.save_data` - calls the :py:meth:`mspasspy.util.Undertaker.Undertaker.bury` - method of :py:class:`mspasspy.util.Undertaker.Undertaker` on the + :py:meth:`save_data` + calls the :py:meth:`bury` + method of :py:class:`Undertaker` on the contents. The default behavior of - :py:meth:`mspasspy.util.Undertaker.Undertaker.bury` + :py:meth:`bury` is to create a document in the :code:`cemetery` collection with two primary key-value pairs: (a) The :code:`logdata` key is associated with a readable dump of the - :code:`ErrorLogger` content. (b) The :code:`tombstone` key is + :py:class:`ErrorLogger` content. (b) The :code:`tombstone` key is associated with a python dictionary (subdocument in MongoDB jargon) - that is an image of the datum's :py:class:`mspasspy.ccore.utility.Metadata` + that is an image of the datum's :py:class:`Metadata` container. If the :code:`return_data` boolean is set True (default is False), the sample vector/array will be cleared and set to zero length on the returned object. @@ -269,16 +270,16 @@ method has these features: very different definintions of dead: (a) the entire ensemble can be marked dead or (b) only some members are marked dead. If the entire ensemble is marked dead, a common message is posted to - all members and the :py:meth:`mspasspy.util.Undertaker.Undertaker.bury` + all members and the :py:meth:`bury` method is called on all members. If :code:`return_data` is set True, the member data vector is cleared. In the more common situation where only some of the ensemble members are marked dead, - :py:meth:`mspasspy.db.database.Database.save_data` - calls a special member of :py:class:`mspasspy.util.Undertaker.Undertaker` + :py:meth:`save_data` + calls a special member of :py:class:`Undertaker` with a name that is the best python joke ever: - :py:meth:`mspasspy.util.Undertaker.Undertaker.bring_out_your_dead`. + :py:meth:`bring_out_your_dead`. The dead members are separated from those marked live and - passed in a serial loop to :py:meth:`mspasspy.util.Undertaker.Undertaker.bury`. + passed in a serial loop to :py:meth:`bury`. If :code:`return_data` is set True, the member vector is replaced with a smaller version with the dead removed. #. Saves of both atomic an ensemble data have a :code:`cremate` option. @@ -286,9 +287,9 @@ method has these features: Since V2 of MsPASS the recommended way to terminate a parallel processing sequence is to use the mspass -:py:func:`mspasspy.io.distributed.write_distributed_data` function. +:py:func:`write_distributed_data` function. It handles dead data the same way as -:py:meth:`mspasspy.db.database.Database.save_data` described above. +:py:meth:`save_data` described above. Finally, users are warned that data that are passed through a reduce operator will normally discard dead data with no trace. If your workflow has a @@ -302,12 +303,12 @@ If your workflow has edit procedures that kill a significant fraction of your dataset, you should consider using the MsPASS facility for handling dead data within a processing sequence. The main tool for doing so are methods of the -:py:class:`mspasspy.util.Undertaker.Undertaker` class. +:py:class:`Undertaker` class. The class name is a programming joke, but the name is descriptive; its job is to deal with dead data. The class interacts with a Database and has three methods that are most useful for any MsPASS workflow. -1. The :py:meth:`mspasspy.util.Undertaker.Undertaker.bury` method +1. The :py:meth:`bury` method is the normal tool of choice to handle dead data. It has the behavior noted above creating a document in the :code:`cemetery` collection for every dead datum. For atomic data the return @@ -316,19 +317,19 @@ three methods that are most useful for any MsPASS workflow. this method returns a copy of the ensemble with the dead members completely removed. A :code:`cemetery` document is saved for each datum removed from the ensemble. -2. The :py:meth:`mspasspy.util.Undertaker.Undertaker.cremate` method +2. The :py:meth:`cremate` method can be used if you do not want to preserve the error messages that caused kills. With atomic data it returns the smallest ashes possible; a default constructed instance of the parent data object. For ensembles dead data are completely removed. -3. The :py:meth:`mspasspy.util.Undertaker.Undertaker.bring_out_your_dead` method, +3. The :py:meth:`bring_out_your_dead` method, will raise an exception if it receives anything but an ensemble. It returns two ensembles: one with all the live and one with all the dead data. It is actually used internally by - both the :py:meth:`mspasspy.util.Undertaker.Undertaker.bury` - and :py:meth:`mspasspy.util.Undertaker.Undertaker.cremate` methods + both the :py:meth:`bury` + and :py:meth:`cremate` methods when the input is an ensemble. -4. :py:meth:`mspasspy.util.Undertaker.Undertaker.mummify` is useful for +4. :py:meth:`mummify` is useful for reducing the memory footprint of a dataset while preserving the data that is normally saved in :code:`cemetery` at the end of a workflow. It does so by only clearing the sample data arrays and setting the @@ -337,7 +338,7 @@ three methods that are most useful for any MsPASS workflow. members marked dead. The following is a sketch of a typical use of an instance of -:py:class:`mspasspy.util.Undertaker.Undertaker` within a workflow. +:py:class:`Undertaker` within a workflow. A key point is an instance of the class has to be instantiated prior to the data processing workflow steps. diff --git a/docs/source/user_manual/header_math.rst b/docs/source/user_manual/header_math.rst index 6335a71b6..b711071c7 100644 --- a/docs/source/user_manual/header_math.rst +++ b/docs/source/user_manual/header_math.rst @@ -185,7 +185,7 @@ name defined by ``key`` to the constant value set with ``const``. Combining operators ------------------------ We define a final operator class with the name -:py:class:`mspasspy.algorithms.edit.MetadataOperatorChain`. +:py:class:`MetadataOperatorChain`. As the name suggests it provides a mechanism to implement a (potentially complicated) formula from the lower level operators. The class constructor has this usage: diff --git a/docs/source/user_manual/importing_tabular_data.rst b/docs/source/user_manual/importing_tabular_data.rst index 1d4e3d2c3..99d8dc2b7 100644 --- a/docs/source/user_manual/importing_tabular_data.rst +++ b/docs/source/user_manual/importing_tabular_data.rst @@ -35,15 +35,15 @@ types of tabular data: as a number of regional seismic networks use Antelope as their database engine. Furthermore, the Array Network Facility of the Earthscope project used Antelope and the database tables are still accessible at - the AUG web site `here`__. + the AUG web site `here `__. In any case, the key concept to remember is that Antelope/Datascope is just a different implementation of a relational database. Below we describe a special interface in MsPASS to import data from Datascope tables. In MsPASS the central concept used to unify these diverse sources is the -concept of a :code:`DataFrame` that is THE data structure that is the +concept of a :code:`DataFrame`. :code:`DataFrame`s are THE data structure that is the focus of the commonly used -`pandas`__ python package. +`pandas `__ python package. The name does not come from a fuzzy animal but is apparently a acronymn derived from "PANel DAta". "panel" is, in fact, an odd synonym for "table". Pandas and their extension in @@ -56,15 +56,15 @@ stored some other way to a DataFrame. Import/Export Tables Stored in Files ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -Importing data from any common tabular data format I know if is -essentially a solved problem via the :code:`DataFrame` API. +Importing data from any common tabular data format I know is +essentially a solved problem through the pandas, :code:`DataFrame` API. - The documentation on reading tabular data files can be found - `here`__. + `here `__. There are also writers for most of the same formats documented on that same page. - Dask has a more limited set of readers described - `here`__. + `here `__. The reason is that the large data model of DataFrame for a dask workflow is most applicable when the table is large compared to the memory space of a single node. Hence, something like an @@ -73,14 +73,17 @@ essentially a solved problem via the :code:`DataFrame` API. pieces using the standard unix "cat" command. - Pyspark has similar functionallity, but a very different API than pandas and dask. The documentation for the read/write interface can be found - `here`__. + `here `__. The list of formats pyspark can read or write is similar to pandas. -The most common application for reading tabular data is importing -some nonstandard data from a research application stored in one of the +The most common requirement for reading tabular data is needing to import +some nonstandard data from a research application. There are +some common examples in seismology that won't work with standard reader like +the CMT catalog. Most tabular data downloadable on the internet today, +however, is stored in one of the standard formats. For example, here an example extracted from our jupyter notebook tutorial on MongoDB. It shows how one can import the output of -`PhaseNet`__ +`PhaseNets `__ with it's output structured as a csv file. It also shows how the results can be written to MongoDB in a collection it creates called "arrival": @@ -108,13 +111,13 @@ Import/Export Data from an SQL server ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Importing data from an SQL server or writing a DataFrame to an -SQL server are best thought of different methods of the +SQL server are best thought of as different methods of the DataFrame implementation. e.g. if the data in the example above had been stored on an SQL server you would change the line :code:`df = pd.read_csv('./data/picks.csv')` to use the variant for interacting with an SQL server. See the links above the to -io sections for pandas, dask, and pyspark for details on what the -work out the correct incantation for your needs. +IO sections for pandas, dask, and pyspark for details on the +correct incantation for your needs. Import/Export of Data from Datascope ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ @@ -129,11 +132,11 @@ there are a lot of known ways that the Antelope software can be used in combination with MsPASS to handle novel research problems. As a result, we created a special API to interact with data managed by an Antelope database. The MsPASS API aims to -loosely mimic the SQL functionality using an pandas -DataFrame as the intermediary. +loosely mimic core SQL functionality using pandas +:code:`DataFrame`s as the intermediary. A python class called -:py:class:`mspasspy.preprocessing.css30.datascope.DatascopeDatabase` +:py:class:`DatascopeDatabase` is used as the agent for interacting with an Antelope/Datascope database. Unlike all common SQL database of today, Datascope does not use a client-server model @@ -145,11 +148,12 @@ Datascope the file names are special and take the form `dbname`.`table`. `dbname` is the name of the collection of tables that is the "database": the "database name". As the word implies `table` is the schema name for a particular table -that that file contains. For example, if one sees an +that that file contains. For example, if one sees a Datascope table with the file name "usarray.arrival" that file is the "arrival" table in a database someone chose to call "usarray". + With that background, an instance of a -:py:class:`mspasspy.preprocessing.css30.datascope.DatascopeDatabase` +:py:class:`DatascopeDatabase` can be created with a variant of the following code fragment: .. code-block:: python @@ -162,7 +166,7 @@ albeit confusing, feature that allows the collection of files that define the database to be spread through multiple directories. That features is nearly always exploited, in practice, by placing more static tables in a separate directory. For that reason -:py:class:`mspasspy.preprocessing.css30.datascope.DatascopeDatabase` +:py:class:`DatascopeDatabase` has an optional `dir` argument to point the constructor to read data files from a different directory. e.g. a variant of the above example to access files in a "dbmaster" (common practice) @@ -173,21 +177,24 @@ directory is the following: dsdbm = DatascopeDatabase("usarray",dir="~/dbmaster") Once, an instance of -:py:class:`mspasspy.preprocessing.css30.datascope.DatascopeDatabase` +:py:class:`DatascopeDatabase` is created that points to the directory from which you want to import one or more tables, the usage to fetch the data that table contains is similar to that for the pandas SQL readers. Use the -:py:meth:`mspasspy.preprocessing.css30.datascope.DatascopeDatabase.get_table` +:py:meth:`get_table` method of -:py:class:`mspasspy.preprocessing.css30.datascope.DatascopeDatabase` +:py:class:`DatascopeDatabase` to retrieve individual tables from the Datascope database -as a pandas DataFrame. An important option descibed in the +as a pandas DataFrame. An important option described in the docstrng is a python list passed via the optional argument with key `attributes_to_load`. The default loads the entire css3.0 schema table. Use a list to limit what attributes are retrieved. -As an example of a typical use of the -:py:meth:`mspasspy.preprocessing.css30.datascope.DatascopeDatabase.get_table` -method the following would retrieve the coordinate data from +That is frequently desirable as all CSS3.0 tables have attributes that +are often or nearly always null. + +The following eexample shows a typical use of the +:py:meth:`get_table` +method. This example retrieves the coordinate data from the usarray "site" tables using the handle `dsdbm` created with the code line above: @@ -199,12 +206,12 @@ the code line above: The result could be used for normalization to load coordinates by station name. (In reality there are some additional complications related to the time fields and seed station codes. Those, however are a side issue -that would only confuse the topic of discussion so ignore it here.) +that would only confuse the topic of discussion so I ignore it here.) The inverse of the -:py:meth:`mspasspy.preprocessing.css30.datascope.DatascopeDatabase.get_table` +:py:meth:`get_table` method is the -:py:meth:`mspasspy.preprocessing.css30.datascope.DatascopeDatabase.df2table` +:py:meth:`df2table` method. As it's name implies it attempts to write a pandas DataFrame to a particular Datascope table, which means it will attempt to write a properly formatted text file for the table name passed to the @@ -214,7 +221,7 @@ Finally, the :code:`datascope.py` module also contains two convenience methods that simply two common operations with Datascope database tables: -#. :py:meth:`mspasspy.preprocessing.css30.datascope.DatascopeDatabase.CSS30Catalog2df` +#. :py:meth:`CSS30Catalog2df` creates the standard "catalog-view" of CSS3.0. In seismology a "catalog" is a image of what in ancient times was distributed as book tabulating earthquake hypocenter estimates and arrival time data used @@ -223,12 +230,12 @@ Datascope database tables: event->origin->assoc->arrival where "->" symbolizes a right database join operator. It returns a pandas DataFrame that is the "catalog". Usage details can be gleaned from the docstring. -#. :py:meth:`mspasspy.preprocessing.css30.datascope.DatascopeDatabase.wfdisc2doclist` +#. :py:meth:`wfdisc2doc` can be thought of as an alternative to the MsPASS - :py:meth:`mspasspy.db.database.Database.index_mseed_file` method. + :py:meth:`index_mseed_file` method. It returns a list of python dictionaries (documents) that are roughly equivalent to documents created by - :py:meth:`mspasspy.db.database.Database.index_mseed_file`. + :py:meth:`index_mseed_file`. The main application is to use the alternative miniseed indexer of Antelope. There are many ways that raw miniseed files from experimental data (i.e. data not sanitized for storage in the archives) @@ -244,3 +251,9 @@ Datascope database tables: dsdb = DatascopeDatabase("usarray") doclist = dsdb.wfdisc2doclist() db.wf_miniseed.insert_many(doclist) + +.. note:: + + Be warned that py:meth:`wfdisc2doc` + only work with a wfdisc that is an index to miniseed data. It does not currently + support other formats defined by CSS3.0. diff --git a/docs/source/user_manual/mongodb_and_mspass.rst b/docs/source/user_manual/mongodb_and_mspass.rst index 452aaa016..b5ffaa37a 100644 --- a/docs/source/user_manual/mongodb_and_mspass.rst +++ b/docs/source/user_manual/mongodb_and_mspass.rst @@ -883,7 +883,7 @@ formatted table display. As noted above pandas are your friend in creating such a report. Here is an example that creates a report of all stations listed in the site collection with coordinates and the time range of recording. It is a variant of a code block in our -`mongodb_tutorial`__ +`mongodb_tutorial `__ .. code-block:: python @@ -912,7 +912,7 @@ in our notebook tutorials, is downloading and loading the current CMT catalog and loading it into a nonstandard collection we all "CMT". In this manual we focus on the fundamentals of the pymongo API for saving documents. See the -`mongodb_tutorial`__ +`mongodb_tutorial `__ for the examples. There are two methods of `Database.collection` that you can use to diff --git a/docs/source/user_manual/obspy_interface.rst b/docs/source/user_manual/obspy_interface.rst index 9be6519b8..3762340ef 100644 --- a/docs/source/user_manual/obspy_interface.rst +++ b/docs/source/user_manual/obspy_interface.rst @@ -25,7 +25,7 @@ They have similarities but some major differences that a software engineer might ObsPy defines two core data objects: #. ObsPy :py:class:`Trace ` containers hold a single channel of seismic data. - An ObsPy :code:`Trace` maps almost directly into the MsPASS atomic object we call a :code:`TimeSeries`. + An ObsPy :code:`Trace` maps almost directly into the MsPASS atomic object we call a :py:class:`TimeSeries`. Both containers store sample data in a contiguous block of memory that implement the linear algebra concept of an N-component vector. Both store the sample data as Python float values that always map to 64 bit floating point numbers (double in C/C++). Both containers also put auxiliary data used to expand on the definition of what the data are in an indexed container that behaves like a Python dict. @@ -36,10 +36,10 @@ ObsPy defines two core data objects: In ObsPy the header parameters are stored in an attribute with a different name (Stats) while in MsPASS the dict behavior is part of the TimeSeries object. (For those familiar with Object Oriented Programming generic concepts ObsPy views metadata using the concept that a Trace object "has a Stats" container while in MsPASS we say a TimeSeries "is a Metadata".) #. ObsPy :py:class:`Stream ` containers are little more than a Python list of :code:`Trace` objects. - A :code:`Stream` is very similar in concept to the MsPASS data object we call a :code:`TimeSeriesEnsemble`. + A :code:`Stream` is very similar in concept to the MsPASS data object we call a :py:class:`TimeSeriesEnsemble`. Both are containers holding a collection of single channel seismic data. In terms of the data they contain there is only one fundamental difference; - a :code:`TimeSeriesEnsemble` is not just a list of data but it also contains a :code:`Metadata` container that contains attributes common to all members of the ensemble. + a :py:class:`TimeSeriesEnsemble` is not just a list of data but it also contains a :code:`Metadata` container that contains attributes common to all members of the ensemble. There are some major collisions in concept between ObsPy's approach and that we use in MsPASS that impose some limitations on switching between the packages. @@ -51,8 +51,8 @@ There are some major collisions in concept between ObsPy's approach and that we They only add overhead in constructing ObsPy data objects. #. ObsPy's support for three-component data is mixed in the concept of a :code:`Stream`. A novel feature of MsPASS is support for an atomic data class we call a Seismogram that is a container designed for consistent handling of three component data. -#. In MsPASS we define a second type of "Ensemble" we call a :code:`SeismogramEnsemble` that is a collection of :code:`Seismogram` objects. - It is conceptually identical to a :code:`TimeSeriesEnsemble` except the members are :code:`Seismogram` objects instead of :code:`TimeSeries` objects. +#. In MsPASS we define a second type of "Ensemble" we call a :py:class:`TimeSeriesEnsemble` that is a collection of :py:class:`Seismogram` objects. + It is conceptually identical to a :py:class:`TimeSeriesEnsemble` except the members are :py:class:`Seismogram` objects instead of :py:class:`TimeSeries` objects. The concept collision between Seismograms objects and any ObsPy data creates some limitations in conversions. A good starting point is this axiom: converting from MsPASS Seismogram objects or SeismogramEnsemble to ObsPy Stream objects is simple and robust; @@ -69,7 +69,7 @@ With that background the set of converters are: Currently the only retained channel properties are orientation information (:code:`hang` and :code:`vang` attributes). For most users the critical information lost in the opposite conversion (:code:`Stream2Seismogram`) is any system response data. A corollary that follows logically is that if you need to do response corrections for your workflow you need to do so equally on all three components before converting the data to Seismogram objects. - Because of complexities in converting from :code:`Stream` to :code:`Seismogram` objects we, in fact, do not recommend using :code:`Stream2Seismogram` for that purpose. + Because of complexities in converting from :code:`Stream` to :py:class:`Seismogram` objects we, in fact, do not recommend using :code:`Stream2Seismogram` for that purpose. If the parent data originated as miniSEED from an FDSN data center, a more reliable and flexible algorithm is the :code:`BundleSEEDGroup` function. - The MsPASS ensemble data converters can be used to convert to and from ObsPy :code:`Stream` objects, although with side effects in some situations. As with the other converters the (verbose) names are mnemonic for their purpose. diff --git a/docs/source/user_manual/parallel_processing.rst b/docs/source/user_manual/parallel_processing.rst index 3618615c1..8801a2fbb 100644 --- a/docs/source/user_manual/parallel_processing.rst +++ b/docs/source/user_manual/parallel_processing.rst @@ -110,9 +110,9 @@ Overview ----------- A second key concept we utilize for processing algorithms in MsPASS is the abstraction of -`functional programming`__, +`functional programming `__, which is a branch of programming founded on -`lambda calculus`__. +`lambda calculus `__. For most seismologists that abstraction is likely best treated only as a foundational concept that may or may not be helpful depending on your background. It is important, however, @@ -231,7 +231,7 @@ single result stored with the symbol :code:`stack`. We will get to the rules that constrain :code:`Reduce` operators in a moment, but it might be more helpful to you as a user to see how that algorithm translates into dask/spark. MsPASS has a parallel stack algorithm found -`here`__ +`here `__ It is used in a parallel context as follows for dask: .. code-block:: python @@ -245,7 +245,7 @@ For spark the syntax is identical but the name of the method changes to reduce: res = rdd.reduce(lambda a, b: stack(a, b)) The :code:`stack` symbol refers to a python function that is actually quite simple. You can view -the source code `here`__. +the source code `here `__. It is simple because most of the complexity is hidden behind the += symbol that invokes that operation in C++ (`TimeSeries::operator+=` for anyone familiar with C++) to add the right hand side to the left hand side of @@ -261,7 +261,7 @@ which is a generic wrapper to adapt any suitable reduce function to MsPASS. The final issue we need to cover in this section is what exactly is meant by the phrase "any suitable reduce function" at the end of the previous paragraph? To mesh with the reduce framework used by spark and dask a function has -to satisfy `the following rules`__ : +to satisfy `the following rules `__ : 1. The first two arguments (a and b symbols in the example above) must define two instances of the same type @@ -328,7 +328,7 @@ overhead is relatively small unless the execution time for processing is trivial. For more information, the dask documentation found -`here`__ is a good +`here `__ is a good starting point. Examples: @@ -425,11 +425,11 @@ To read this page we recommend you open a second winodw or tab on your web browser to the current file in the mspass source code directory called :code:`scripts/tacc_examples/distributed_node.sh`. The link to the that file you can view on your web browser is -`here`__. +`here `__. We note there is an additional example there for running MsPASS on a single node at TACC called :code:`scripts/tacc_examples/single_node.sh` you can access directly -`here`__, +`here `__, The single node setup is useful for testing and may help your understanding of what is needed by being much simpler. We do not discuss that example further here, however, because a primary purpose for using @@ -476,7 +476,7 @@ script on a head node without having to submit the full job. MsPASS was designed to be run in a container. For a workstation environment we assume the container system being used is docker. Running MsPASS with docker is described on -`this wiki page`__. +`this wiki page `__. All HPC systems we know have a docker compatible system called :code:`singularity`. Singularity can be thought of as docker for a large HPC cluster. The most important feature of singularity for you as a user @@ -490,15 +490,15 @@ the way it used by docker as follows: For more about running MsPASS with singularity consult our wiki page found -`here`__. +`here `__. Since our examples here were constructed on TACC' Stampede2 you may also find it useful to read their page on using singularity found -`here`__ +`here `__ There is a single node mode you may want to run for testing. You can find an example of how to configure Stampede2 to run on a single node in the MsPASS scripts/tacc_examples found on github -`here`__. +`here `__. We focus is manual on configuration for a production run using multiple nodes, that is a primary purpose of using MsPASS for data processing. The example we give here is the @@ -523,7 +523,7 @@ And it doesn't matter if the directory doesn't exist, the job script would creat Then we define the SING_COM variable to simplify the workflow in our job script. On Stampede2 and most of HPC systems, we use Singularity to manage and run the docker images. There are many options to start a container using singularity, which you could refer to their documentation. And for those who are not familiar with Singularity, -here is a good `source`__ +here is a good `source `__ to get start with. .. code-block:: bash @@ -564,7 +564,7 @@ before: SINGULARITYENV_MSPASS_WORK_DIR=$WORK_DIR $SING_COM Here we set the environment variables inside the container using this syntactic sugar SINGULARITYENV_XXX. -For more information, you could view the usage `here`_. +For more information, you could view the usage `here `_. We define and set different variables in different containers we start because in our start-mspass.sh, we define different bahavior under different *MSPASS_ROLE* so that for each role, it will execute the bash script we define in the start-mspass.sh. Though it looks complicated and hard to extend, this is prabably