Export refactor changes (#335)

Resolves #333 Resolves #329 Resolves #327 Resolves #324
MontrealCorpusTools · Oct 1, 2021 · 85f75c0 · 85f75c0
1 parent 0196c42
commit 85f75c0
Show file tree

Hide file tree

Showing 113 changed files with 38,426 additions and 4,091 deletions.
diff --git a/.coveragerc b/.coveragerc
@@ -18,6 +18,12 @@ exclude_lines =
 
     except ImportError:
 
+    except KeyboardInterrupt:
+
+    except Exception as e:
+
+    except Exception:
+
     if call_back
     if stop_check
 

diff --git a/.travis.yml b/.travis.yml
@@ -36,7 +36,7 @@ install:
   - which python
   - which sox
   - conda list
-  - python -m montreal_forced_aligner.command_line.thirdparty download
+  - python -m montreal_forced_aligner.command_line.mfa thirdparty download
   - ls $HOME/Documents/MFA/thirdparty/bin -al
   - $HOME/Documents/MFA/thirdparty/bin/compute-mfcc-feats --help
   - $HOME/Documents/MFA/thirdparty/bin/ivector-extractor-est --help

diff --git a/docs/source/_static/dictionary_annotation.png b/docs/source/_static/dictionary_annotation.png
diff --git a/docs/source/_static/speaker_annotation.png b/docs/source/_static/speaker_annotation.png
diff --git a/docs/source/annotator.rst b/docs/source/annotator.rst
@@ -1,134 +1,30 @@
 
-.. _`LAV filters`: https://github.com/Nevcairiel/LAVFilters/releases
+.. _`Anchor Annotator documentation`: https://anchor-annotator.readthedocs.io/en/latest/
 
 .. _annotator:
 
-*********
-Annotator
-*********
+****************
+Anchor annotator
+****************
+
+The Anchor Annotator is a GUI utility for MFA that allows for users to modify transcripts and add/change entries in the pronunciation dictionary to interactively fix out of vocabulary issues.
 
 .. attention::
 
-   The GUI annotator is under development and is currently pre-alpha. Use at your own risk and please use version control
+   Anchor is under development and is currently pre-alpha. Use at your own risk and please use version control
    or back up any critical data.
 
-Currently the functionality of the Annotator GUI allows for users to modify transcripts and add/change
-entries in the pronunciation dictionary to interactively fix out of vocabulary issues.
-
-.. warning::
-
-   If you are trying to use the annotator from Windows, note that some issues will be present as native Windows use is not
-   fully supported. Specifically if you need G2P functionality, that does not function on Windows due to its dependencies
-   not being available (Pynini, Opengrm-ngram, OpenFst).
-
-To use the annotator, first follow the instructions in :ref:`installation`.  Once MFA is installed and thirdparty binaries
-have been downloaded, run the following command:
-
-.. code-block:: bash
-
-    mfa annotator
-
-Initial setup
-=============
-
-To load a corpus for inspection, go to the Corpus drop down menu and select "Load a corpus".  Navigate
-to the desired corpus directory.  Please note that it should follow one of the data formats outlined in :ref:`data_format`.
-
-.. note::
-
-   Some set up of system codecs may be necessary to playback those types of files.  For Windows, `LAV filters` has been
-   tested to work with :code:`.flac` files.
-
-Next, dictionary files and G2P models should be loaded via their respective menus.  If any pretrained
-models have been installed via :ref:`pretrained_models`, these can be selected directly.
-
-Fixing out of vocabulary issues
-===============================
-
-Once the corpus is loaded with a dictionary, utterances in the corpus will be parsed for whether they contain
-an out of vocabulary (OOV) word.  If they do, they will be marked in that column on the left with a red cell
-(see number :code:`2` below).
-
-To fix a transcript, click on the utterance in the table.  This will bring up a detail view of the utterance,
-with a waveform window above and the transcript in the text field.  Clicking the ``Play`` button (or ``Tab`` by default)
-will allow you to listen to the audio.   Pressing the ``Save current file`` button (see number :code:`10` below) will save the
-utterance text to the .lab/.txt file or update the interval in the TextGrid.
 
-.. warning::
+To use the annotator, first install the anchor subpackage:
 
-   Clicking ``Save`` will overwrite the source file loaded, so use this software with caution.
-   Backing up your data and/or using version control is recommended to ensure that any data loss
-   during corpus creation is minimized.
+.. code-block::
 
-If the word causing the OOV warning is in fact a word you would like aligned, you can right click on
-the word and select ``Add pronunciation for 'X'`` if a G2P model is loaded (see number :code:`7` below).  This will run the G2P
-model to generate a pronunciation in the dictionary which can then be modified if necessary and the dictionary
-can be saved via the ``Save dictionary`` button.  You can also look up any word in the pronunciation
-dictionary by right clicking and selecting ``Look up 'X' in dictionary``.  Any pronunciation can be modified
-and saved.  The ``Reset dictionary`` button wil discard any changes made to the dictionary.
+   pip install montreal-forced-aligner[anchor]
 
-Fixing segments
-===============
-
-.. figure:: _static/dictionary_annotation.png
-    :align: center
-    :alt: Image cannot be displayed in your browser
-
-The file you want to fix up can be selected via the dropdown in the top left (number :code:`1` above).
-
-For fixing up intervals, you can select segments in the left table (number :code:`2` above), or by clicking on
-intervals in the plot window (i.e., number :code:`5` above).
-You can edit the text in the center bottom box (number :code:`6` above), change the speaker via the dropdown next to the
-text box (number :code:`12` below), and adjust
-boundaries as necessary (green lines associated with number :code:`4` below).  If you would like to add a new speaker,
-then it can be accessed via the :code:`Speaker` tab
-on the right pane, which will also list counts of utterances (see :code:`13` below). Entering a speaker name and clicking
-"Add speaker" (:code:`14` below), will make that speaker available in the dropdown.
-
-Single segments can be split via a keyboard shortcut (by default :code:`Ctrl+S`, but this can be changed, see
-:ref:`configure_annotator` for more details).  This will create two segments from one, split at the midpoint, but with all
-the text in the first segment.
-
-Multiple segments can be selected by holding :code:`Ctrl` (with selections shown in the left pane, though not in the waveform panel),
-and can be merged into single
-segments via a keyboard shortcut (by default :code:`Ctrl+M`, but this can be changed, see :ref:`configure_annotator`
-for more details).  Any number of segments can be selected this way, and the resulting merged segment will concatenate
-the transcriptions for them all.  In general, be cautious about creating too long of utterances, as in general there
-is better performance in alignment for shorter utterances, and often breath pauses make for good segment boundaries if
-they're visible on the waveform.
-
-.. figure:: _static/speaker_annotation.png
-    :align: center
-    :alt: Image cannot be displayed in your browser
-
-Segments can be added via double clicking on a speaker's tier (i.e., number :code:`11`), however, it is disabled if a
-segment exists at that point. Any segments can also be deleted via a shortcut (by default :code:`Delete`).  There is limited
-restore functionality for deleted utterances, via a button on the bottom left.
-
-
-.. _configure_annotator:
-
-Configuring the annotator
-=========================
-
-By going to :code:`Preferences` in the :code:`Edit` menu, many aspects of the interface can be changed.  The two primary
-customizations currently implemented are for the appearance of the waveform/segment window and for  keyboard shortcuts.
-
-The current available shortcuts are:
-
-.. csv-table::
-   :header: "Function", "Default keybind"
-
-   "Play audio", "Tab"
-   "Zoom in", "Ctrl+I"
-   "Zoom out", "Ctrl+O"
-   "Pan left", "Left arrow"
-   "Pan right", "Right arrow"
-   "Merge utterances", "Ctrl+M"
-   "Split utterances", "Ctrl+S"
-   "Delete utterances", "Del"
-   "Save current file", "By default not bound, but can be set"
-   "Create new segment", "Double click (currently not rebindable)"
+This will install MFA if hasn't been along with all the packages that Anchor requires.  Once installed, Anchor can be started with the following MFA subcommand:
 
+.. code-block:: bash
 
+    mfa anchor
 
+See the `Anchor Annotator documentation`_ for more information.
diff --git a/docs/source/apireference.rst b/docs/source/apireference.rst
@@ -111,9 +111,6 @@ Feature processing API
    :template: function.rst
 
    mfcc
-   apply_cmvn
-   add_deltas
-   apply_lda
 
 .. _multiprocessing_api:
 

diff --git a/docs/source/changelog.rst b/docs/source/changelog.rst
@@ -7,6 +7,20 @@
 Changelog
 =========
 
+2.0.0b0
+-------
+
+Beta release!
+
+- Fixed an issue in transcription when using a .ARPA language model rather than one built in MFA
+- Fixed an issue in parsing filenames containing spaces
+- Added a ``mfa configure`` command to set global options.  Users can now specify a new default for arguments like ``--num_jobs``, ``--clean`` or ``--temp_directory``, see :ref:`configuration` for more details.
+- Added a new flag for overwriting output files. By default now, MFA will not output files if the path already exists, and will instead write to a directory in the temporary directory.  You can revert this change by running ``mfa configure --always_overwrite``
+- Added a ``--disable_textgrid_cleanup`` flag to disable for post-processing that MFA has implemented recently (not outputting silence labels and recombining subwords that got split up as part of dictionary look up). You can set this to be the default by running ``mfa configure --disable_textgrid_cleanup``
+- Refactored and optimized the TextGrid export process to use multiple processes by default, you should be significant speed ups.
+- Removed shorthand flags for ``-c`` and ``-d`` since they could represent multiple different flags/arguments.
+
+
 2.0.0a24
 --------
 

diff --git a/docs/source/commands.rst b/docs/source/commands.rst
@@ -1,3 +1,5 @@
+
+
 .. _commands:
 
 ********
@@ -40,7 +42,7 @@ Corpus creation
    "create_segments", "Use voice activity detection to create segments", :ref:`create_segments`
    "train_ivector", "Train an ivector extractor for speaker classification", :ref:`train_ivector`
    "classify_speakers", "Use ivector extractor to classify files or cluster them", :ref:`classify_speakers`
-   "annotator", "Run a GUI annotator program for editing and managing corpora", :ref:`annotator`
+   "anchor", "Run the Anchor annotator utility (if installed) for editing and managing corpora", :ref:`annotator`
 
 
 Other utilities
@@ -52,6 +54,7 @@ Other utilities
 
    "download", "Download a model trained by MFA developers", :ref:`pretrained_models`
    "thirdparty", "Download and validate new third party binaries", :ref:`installation`
+   "configure", "Configure MFA to use customized defaults for command line arguments", :ref:`configuration`
 
 
 Grapheme-to-phoneme

diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -23,7 +23,8 @@
 import mock
 
 MOCK_MODULES = ['textgrid', 'textgrid.textgrid',
-                'praatio', 'praatio.tgio',
+                'praatio', 'praatio.tgio', 'praatio.utilities',
+                'praatio.utilities.constants',
                 'tqdm', 'yaml',
                 'numpy', 'resampy', 'audioread',
                 'scipy', 'scipy.signal', 'scipy.io',

diff --git a/docs/source/configuration.rst b/docs/source/configuration.rst
@@ -5,10 +5,82 @@
 Configuration
 *************
 
-Contents:
+Global configuration for MFA can be updated via the ``mfa configure`` subcommand. Once the command is called with a flag, it will set a default value for any future runs (though, you can overwrite most settings when you call other commands).
+
+Options available:
+
+.. option:: -t
+               --temp_directory
+
+   Set the default temporary directory
+
+.. option:: -j
+               --num_jobs
+
+   Set the number of processes to use by default
+
+.. option:: --always_clean
+
+   Always remove files from previous runs by default
+
+.. option:: --never_clean
+
+   Don't remove files from previous runs by default
+
+.. option:: --always_verbose
+
+   Default to verbose output (outputs debug messages)
+
+.. option:: --never_verbose
+
+   Default to non-verbose output
+
+   Default to verbose output (outputs debug messages)
+
+.. option:: --always_debug
+
+   Default to running debugging steps
+
+.. option:: --never_debug
+
+   Default to not running debugging steps
+
+.. option:: --always_overwrite
+
+   Always overwrite output files
+
+.. option:: --never_overwrite
+
+   Never overwrite output files (if file already exists, the output will be saved in the temp directory)
+
+.. option:: --disable_mp
+
+   Disable all multiprocessing (not recommended as it will usually increase processing times)
+
+.. option:: --enable_mp
+
+   Enable multiprocessing (recommended and enabled by default)
+
+.. option:: --disable_textgrid_cleanup
+
+   Disable postprocessing of TextGrids that cleans up silences and recombines compound words and clitics
+
+.. option:: --enable_textgrid_cleanup
+
+   Enable postprocessing of TextGrids that cleans up silences and recombines compound words and clitics
+
+.. option:: -h
+               --help
+
+   Display help message for the command
+
+
+
+Configuration of commands
+=========================
 
 .. toctree::
-   :maxdepth: 3
+   :maxdepth: 1
 
    configuration_align.rst
    configuration_transcription.rst

diff --git a/docs/source/configuration_align.rst b/docs/source/configuration_align.rst
@@ -196,7 +196,7 @@ Default training config file
      - sat:
          num_leaves: 2500
          max_gaussians: 15000
-         fmllr_power: 0.2
+         power: 0.2
          silence_weight: 0.0
          fmllr_update_type: "diag"
          subset: 10000
@@ -206,7 +206,7 @@ Default training config file
      - sat:
          num_leaves: 4200
          max_gaussians: 40000
-         fmllr_power: 0.2
+         power: 0.2
          silence_weight: 0.0
          fmllr_update_type: "diag"
          subset: 30000
@@ -246,7 +246,7 @@ Training configuration for 1.0
      - sat:
          num_leaves: 3100
          max_gaussians: 50000
-         fmllr_power: 0.2
+         power: 0.2
          silence_weight: 0.0
          cluster_threshold: 100
          fmllr_update_type: "full"

diff --git a/montreal_forced_aligner/__init__.py b/montreal_forced_aligner/__init__.py
@@ -1,6 +1,6 @@
 __ver_major__ = 2
 __ver_minor__ = 0
-__ver_patch__ = '0a24'
+__ver_patch__ = '0b0'
 __version__ = "{}.{}.{}".format(__ver_major__, __ver_minor__, __ver_patch__)
 
 __all__ = ['aligner', 'command_line', 'models', 'corpus', 'config', 'dictionary', 'exceptions',