Reworded parts of documentation for clarity.

john-hen · May 15, 2020 · 292a431 · 292a431
1 parent 3e5a288
commit 292a431
Show file tree

Hide file tree

Showing 4 changed files with 42 additions and 39 deletions.
diff --git a/docs/conf.py b/docs/conf.py
@@ -4,16 +4,16 @@
 The files in this folder are used to render the documentation of this
 package, from its source files, as a static web site. The renderer is
 the documentation generator Sphinx. It is configured by this very
-script and would be invoked on the command line via, on any operating
-system, via `sphinx-build . rendered`. The static HTML then ends up in
-the sub-folder `rendered`, where `index.html` is the start page.
+script and would be invoked on the command line, of any operating
+system, via `sphinx-build . rendered`. The static HTML then ends up
+in the sub-folder `rendered`, where `index.html` is the start page.
 
 The source files are the `.md` files here, where `index.md` maps to
-the start page, as well as the documentation string in the package's
-source code for the API documentation.
+the start page, as well as the documentation strings in the package's
+source code that provide the API documentation.
 
 All text may use mark-up according to the CommonMark specification of
-the Markdown syntax. The Sphinx extension `recommonmark` is used to
+the Markdown syntax. The Sphinx extension `reCommonMark` is used to
 convert Markdown to reStructuredText, Sphinx's native input format.
 """
 __license__ = 'MIT'

diff --git a/docs/implementation.md b/docs/implementation.md
@@ -1,7 +1,7 @@
 Implementation
 --------------
 
-This Python implementation was developed based on the existing Matlab
+This Python library was developed based on the existing Matlab
 implementation for [1d][1] and [2d][2], which was used as the
 primary reference (albeit possibly in earlier versions previously
 stored at the same locations), and the [original paper][3] as a
@@ -20,18 +20,19 @@ or n dimensions, [`dct`][5]/[`dctn`][6] and [`idct`][7]/[`idctn`][8],
 as well as NumPy's [`histogram`][9] and [`histogram2d`][10], instead
 of the custom versions the Matlab reference employs.
 
-The reference uses a cosine transformation with a different weight for
-the very first component, one which appears to not be supported by
-SciPy. There is an easy work-around for that, which is used in the
-current code. It should however be possible to rewrite the algorithm
-in a more elegant way, one that avoids the work-around altogether.
+The reference uses a cosine transformation with a weight for the very
+first component that is different from the one in any of the four types
+of the transformation supported by SciPy. There is an easy work-around
+for that, which is used in the current code. It should however be
+possible to rewrite the algorithm in a more elegant way, one that avoids
+the work-around altogether.
 
 The Matlab implementation also bins the data somewhat differently in
 1d vs. the 2d case. This minor inconsistency was removed. The change
 is arguably insignificant as far the final results are concerned,
 but is a deviation nonetheless.
 
-In practical use, based on a handful of test cases, both implementations
+In practical use, based on a handful of tests, both implementations
 yield indiscernible results.
 
 The 2d density is returned in matrix index order, also known as
@@ -50,7 +51,7 @@ In very broad strokes, the method is this:
 * This leaves Gaussian kernels intact.
 * Gaussians are also elementary solutions to the diffusion equation.
 * Leverage this to define condition for optimal smoothing.
-* Solve optimum condition by iterating in Fourier space.
+* Find optimum by iteration in Fourier space.
 * Smooth transformed data with optimized Gaussian kernel.
 * Reverse transformation to obtain density estimation.
 

diff --git a/docs/index.md b/docs/index.md
@@ -1,18 +1,9 @@
 KDE-diffusion
 =============
 
-Kernel density estimation via diffusion in 1d and 2d.
-
-Provides the fast, adaptive kernel density estimator based on linear
-diffusion processes for one-dimensional and two-dimensional input data
-as outlined in the [2010 paper by Botev et al.][1] The reference
-implementation for [1d][2] and [2d][3], in Matlab, was provided by the
-paper's first author, Zdravko Botev. This is a re-implementation in
-Python, with added test coverage.
-
 Kernel density estimation is a statistical method to infer the
 *true* probability density function that governs the distribution of
-a random variable from discrete observations of that same variable.
+a random variable from discrete observations of that same entity.
 The variable may have more than one component, i.e. be described by
 several coordinates.
 
@@ -23,26 +14,35 @@ use case is the determination of a spatially-resolved particle flux
 as measured by a detector array that is sensitive to rare, individual
 impacts.
 
-Kernel density estimation basically works like so: Bin the discrete
+Kernel density estimation basically works like this: Bin the discrete
 observations in a histogram. This is straightforward and takes little
 computation time. Then smooth the data over the bins/grid with an
 image filter that adds *adequate* blur. The shape of the filter
 function is referred to as the "kernel" and its spatial extent as the
-"bandwidth". The trick is to find the optimal filter size (bandwidth)
-that does not smear out the data too much, but also averages out the
+"bandwidth". The trick is to find the optimal filter size, one that
+does not smear out the data too much, but also averages over the
 artifacts that are due to the discrete nature of the input.
 
-This implementation here is particularly fast. Orders of magnitude
-faster, for instance, than [SciPy's Gaussian kernel estimator][4].
-Or those provided by [Scikit-Learn][5]. And most of [KDEpy's][6]
-— except for `FFTKDE`, which uses a very similar algorithm, but has
-no automatic bandwidth selection in dimensions higher than 1.
-
-Automatic bandwidth selection is however key. Otherwise one may just
-apply a [Gaussian filter][7] and manually tune it until the results
-look pleasing to the human eye. The bandwidth selection is what makes
-kernel density estimation non-parametric, so that we avoid making
-possibly misguided assumptions about the nature of the data.
+This library provides the adaptive kernel density estimator based on
+linear diffusion processes for one-dimensional and two-dimensional
+input data as outlined in the [2010 paper by Botev et al.][1] The
+reference implementation for [1d][2] and [2d][3], in Matlab, was
+provided by the paper's first author, Zdravko Botev. This is a
+re-implementation in Python, with added test coverage.
+
+The diffusion-inspired method is particularly fast. Orders of
+magnitude faster, for instance, than [SciPy's Gaussian kernel
+estimator][4]. Or those provided by [Scikit-Learn][5]. And most of
+[KDEpy's][6] — except for `FFTKDE`, which uses a very similar
+algorithm, but has no automatic bandwidth selection in dimensions
+higher than one.
+
+Automatic bandwidth selection is however key. Otherwise one may as
+well just apply a [Gaussian filter][7] and manually tune its size,
+i.e. the bandwidth, until the results look pleasing to the human eye.
+The bandwidth selection is what makes kernel density estimation a
+non-parametric method, so that we avoid making — possibly misguided —
+assumptions about the nature of the data.
 
 
 [1]: https://dx.doi.org/10.1214/10-AOS799

diff --git a/docs/usage.md b/docs/usage.md
@@ -49,7 +49,9 @@ pyplot.show()
 Note that the density is returned in matrix index order, also known as
 Cartesian indexing, i.e. with the first index referring to the x-axis
 and the second to the y-axis. This is the common convention for 2d
-histograms and kernel density estimations, or science in general.
+histograms and kernel density estimations, or [science in general][1].
 Images, however, are universally indexed the other way around: y before
 x. This is why the density in the example is transposed before being
 displayed.
+
+[1]: https://stackoverflow.com/a/56917343