diff --git a/docs/tutorial/force_calibration/robust_fitting.rst b/docs/tutorial/force_calibration/robust_fitting.rst index 49c9a2786..ab0d49781 100644 --- a/docs/tutorial/force_calibration/robust_fitting.rst +++ b/docs/tutorial/force_calibration/robust_fitting.rst @@ -9,17 +9,25 @@ Robust fitting So far, we have been using least-squares fitting routines for force calibration. In that case, we assume that the error in the power at each frequency is distributed according to a Gaussian distribution. -:ref:`Blocking or windowing` the power spectrum ensures that this assumption is close enough to the truth such that the fit provides accurate estimates of the unknown parameters. +:ref:`Blocking or windowing` the power spectrum ensures that this assumption is +close enough to the truth such that the fit provides accurate estimates of the unknown parameters. Occasionally, the power spectrum might show a spurious noise peak. -Such a peak is an outlier in the expected behavior of the spectrum and therefore interferes with the assumption of having a Gaussian error distribution. +Such a peak is an outlier in the expected behavior of the spectrum and therefore interferes with the +assumption of having a Gaussian error distribution. As a result, the fit is skewed. In those cases, it can be beneficial to do a robust fit. + When a robust fit is performed, one assumes that the probability of encountering one or multiple outliers is non-negligible. By taking this into account during fitting, the fit can be made more robust to outliers in the data. -One downside of this approach is that the current implementation does not readily provide standard errors on the parameter estimates and that it leads to a small bias in the fit results for which Pylake has no correction. -Robust fitting can be used in combination with a least-squares fit, to identify outliers automatically, in order to exclude these from a second regular least-squares fit. + +One downside of this approach is that the current implementation does not readily provide standard errors +on the parameter estimates and that it leads to a small bias in the fit results for which Pylake has no correction. + +Robust fitting can be used in combination with a least-squares fit to identify outliers automatically +in order to exclude these from a second regular least-squares fit. The following example illustrates the method. -To see this effect, let's load a dataset of uncalibrated force sensor data of a 4.4 μm bead showing Brownian motion while being trapped. In particular, look at the `Force 2y` sensor signal:: +To see this effect, let's load a dataset of uncalibrated force sensor data of a 4.4 μm bead showing +Brownian motion while being trapped. In particular, look at the `Force 2y` sensor signal:: filenames = lk.download_from_doi("10.5281/zenodo.7729823", "test_data") f = lk.File("test_data/robust_fit_data.h5") @@ -50,8 +58,9 @@ Notice how the tail of the model is skewed towards the peak, in order to reduce In this case, the free parameters to fit the diode filter contribution are 'abused' to reduce the error between the model and the outlier. This results in biased parameter estimates. -Now do a robust fit. We do this by specifying a loss function in the function :func:`~lumicks.pylake.fit_power_spectrum()`. -For least-squares fitting, the loss function is `'gaussian'`, which is the default if nothing is specified. However, if we specify `'lorentzian'`, a robust fitting routine will be used instead. +Now do a robust fit. We do this by specifying a loss function in :func:`~lumicks.pylake.fit_power_spectrum()`. +For least-squares fitting, the loss function is `'gaussian'`, which is the default if nothing is specified. +However, if we specify `'lorentzian'`, a robust fitting routine will be used instead. Because `bias_correction` and robust fitting are mutually exclusive, we need to explicitly turn it off:: fit = lk.fit_power_spectrum(ps_blocked, model, bias_correction=False, loss_function="lorentzian") @@ -69,25 +78,31 @@ Now plot the robust fit:: .. image:: figures/power_spectrum_noise_peak_robust.png -Notice how the model now follows the power spectrum nearly perfectly. The value for `f_diode` has increased significantly, now that it is not abused to reduce the error induced by the outlier. +Notice how the model now follows the power spectrum nearly perfectly. The value for `f_diode` has increased +significantly, now that it is not abused to reduce the error induced by the outlier. + +This example shows that a robust fitting method is less likely to fail on outliers in the power spectrum data. + +It is therefore a fair question why one would not use it all the time? -This example shows that a robust fitting method is less likely to fail on outliers in the power spectrum data. It is therefore a fair question why one would not use it all the time? Robust fitting leads to a small bias in the fit results for which Pylake has no correction. Least-squares fitting also leads to a bias, but this bias is known (:cite:`norrelykke2010power`) and can be corrected with `bias_correction=True`. -Secondly, for least-squares fitting, methods exist to estimate the expected standard errors in the estimates of the free parameters, which are implemented in the least-squares fitting routines that Pylake uses :cite:`press1990numerical`. -These error estimates are not implemented for robust fitting, and as such, the fit results will show `nan` for the error estimates after a robust fit. -However, as will be shown below, the robust fitting results may be used as a start to identify outliers automatically, in order to exclude these from a second, regular least-squares, fit. +Secondly, for least-squares fitting, methods exist to estimate the expected standard errors in the +estimates of the free parameters, which are implemented in the least-squares fitting routines that Pylake uses :cite:`press1990numerical`. +These error estimates are not implemented for robust fitting, and as such, the fit results will show +`nan` for the error estimates after a robust fit. +However, as will be shown below, the robust fitting results may be used as a start to identify outliers automatically, +in order to exclude these from a second, regular least-squares, fit. .. _find_fer: Automated spurious peak detection ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -We will continue the tutorial with the results of the previous section. If you did not yet do that part of the tutorial, please go back and execute the code examples in that section. - -We still have the power spectrum `ps` that was created without blocking or windowing. Here we will use it to identify the peak and automatically obtain frequency exclusion ranges. -We will use the method :meth:`~lumicks.pylake.force_calibration.power_spectrum.PowerSpectrum.identify_peaks()` in order to do so. -This method takes a function that accurately models the power spectrum as a function of frequency, in order to normalize it. +We still have the power spectrum `ps` that was created without blocking or windowing which +we will use to identify the peak and automatically obtain frequency exclusion ranges. +using the method :meth:`~lumicks.pylake.force_calibration.power_spectrum.PowerSpectrum.identify_peaks()`. +This method takes a function that accurately models the power spectrum as a function of frequency in order to normalize it. It then identifies peaks based on the likelihood of encountering a peak of a certain magnitude in the resulting data set. If we have a "good fit", then the easiest way to get that function is to use our fitted model:: @@ -99,15 +114,18 @@ If we have a "good fit", then the easiest way to get that function is to use our plt.xscale("log") plt.yscale("log") -If there are no spurious peaks, then normalizing the unblocked power spectrum results in random numbers with an exponential distribution with a mean value of 1. +If there are no spurious peaks, then normalizing the unblocked power spectrum results in random +numbers with an exponential distribution with a mean value of 1. The chance of encountering increasingly larger numbers decays exponentially, and this fact is used by `identify_peaks()`:: frequency_exclusions = ps.identify_peaks(fit, peak_cutoff=20, baseline=1) -The parameter `peak_cutoff` is taken as the minimum magnitude of any value in the normalized power spectrum in order to be concidered a peak. +The parameter `peak_cutoff` is taken as the minimum magnitude of any value in the normalized power spectrum in order to be considered a peak. The default value is 20, and it corresponds to a chance of about 2 in a billion of a peak of magnitude 20 or larger occuring naturally in a data set. -If a peak is found with this or a higher magnitude, the algorithm then expands the range to the left and right until the first time the power spectrum drops below the value `baseline`. -The frequencies at which this occurs end up as the lower and higher frequency of a frequency exclusion range. As such, the value of `baseline` controls the width of the frequency exclusion range. +If a peak is found with this or a higher magnitude, the algorithm then expands the range to the left and right +until the first point at which the power spectrum drops below the value `baseline`. +The frequencies at which this occurs end up as the lower and upper frequency of an exclusion range. +As such, the value of `baseline` controls the width of the frequency exclusion range. We can visualize the excluded peaks as follows:: fig, ax = plt.subplots(1, 2, sharey=True) @@ -129,7 +147,8 @@ We can visualize the excluded peaks as follows:: Finally, we can do a least-squares fit, but in this case we will filter out the frequency ranges that contain peaks. Because we use a least-squares method, we get error estimates on the fit parameters, and bias in the fit result can be corrected. -The default values of `loss_function='gaussian'` and `bias_correction=True` ensure least-squares fitting and bias correction, so we do not need to specify them:: +The default values of `loss_function='gaussian'` and `bias_correction=True` ensure least-squares fitting +and bias correction, so we do not need to specify them:: ps_no_peak = lk.calculate_power_spectrum( f2y.data, sample_rate=f2y.sample_rate, num_points_per_block=200, fit_range=(10, 23e3), excluded_ranges=frequency_exclusions, @@ -147,4 +166,5 @@ The default values of `loss_function='gaussian'` and `bias_correction=True` ensu .. image:: figures/power_spectrum_no_noise_peak.png -Notice that no skewing occurs, and that the values of `fc`, `D` and `f_diode` are now closer to values found via robust fitting in the section above. +Notice that no skewing occurs, and that the values of `fc`, `D` and `f_diode` are now closer to +values found via robust fitting in the section above.