You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently, the preprocessing.subsampling.statistical_inefficiency will reject the data set if it is not sorted and contains duplications.
I'm aware that @dotsdl is refactoring the subsampling module #98.
I think having the automatic sorting and duplication removal functionality will make the ABFE workflow more tolerable to corrupted datasets.
I'm thinking of adding the sorting and duplication removal to the preprocessing.subsampling.statistical_inefficiency.
def statistical_inefficiency(df, series=None, lower=None, upper=None, step=None,
conservative=True, drop_duplicates=True, sort=True):
"""Subsample a DataFrame based on the calculated statistical inefficiency
of a timeseries.
If `series` is ``None``, then this function will behave the same as
:func:`slicing`.
Parameters
----------
df : DataFrame
DataFrame to subsample according statistical inefficiency of `series`.
series : Series
Series to use for calculating statistical inefficiency. If ``None``,
no statistical inefficiency-based subsampling will be performed.
lower : float
Lower bound to pre-slice `series` data from.
upper : float
Upper bound to pre-slice `series` to (inclusive).
step : int
Step between `series` items to pre-slice by.
conservative : bool
``True`` use ``ceil(statistical_inefficiency)`` to slice the data in uniform
intervals (the default). ``False`` will sample at non-uniform intervals to
closely match the (fractional) statistical_inefficieny, as implemented
in :func:`pymbar.timeseries.subsampleCorrelatedData`.
drop_duplicates : bool
Drop the duplicated lines based on time.
sort : bool
Sort the Dataframe based on the time column.
"""
IOr it is better to make it functionality of the ABFE workflow?
The text was updated successfully, but these errors were encountered:
Currently, the preprocessing.subsampling.statistical_inefficiency will reject the data set if it is not sorted and contains duplications.
I'm aware that @dotsdl is refactoring the subsampling module #98.
I think having the automatic sorting and duplication removal functionality will make the ABFE workflow more tolerable to corrupted datasets.
I'm thinking of adding the sorting and duplication removal to the preprocessing.subsampling.statistical_inefficiency.
IOr it is better to make it functionality of the ABFE workflow?
The text was updated successfully, but these errors were encountered: