Release Notes
--
The CHANGELOG for the current development version is available at
-https://github.com/rasbt/mlxtend/blob/master/docs/sources/CHANGELOG.md.
-
Version 0.9.1 (2017-11-19)
-Downloads
- -New Features
--
-
- Added
mlxtend.evaluate.bootstrap_point632_score
to evaluate the performance of estimators using the .632 bootstrap. (#283)
- - New
max_len
parameter for the frequent itemset generation via theapriori
function to allow for early stopping. ([#270](https://github.co
- - m/rasbt/mlxtend/pull/270)) -
Changes
--
-
- All feature index tuples in
SequentialFeatureSelector
or now in sorted order. (#262)
- - The
SequentialFeatureSelector
now runs the continuation of the floating inclusion/exclusion as described in Novovicova & Kittler (1994).
-Note that this didn’t cause any difference in performance on any of the test scenarios but could lead to better performance in certain edge cases.
-(#262)
- - utils.Counter now accepts a name variable to help distinguish between multiple counters, time precision can be set with the ‘precision’ kwarg and the new attribute end_time holds the time the last iteration completed. (#278) -
Bug Fixes
--
-
- Fixed an deprecation error that occured with McNemar test when using SciPy 1.0. (#283) -
Version 0.9.0 (2017-10-21)
-Downloads
- -New Features
--
-
- Added
evaluate.permutation_test
, a permutation test for hypothesis testing (or A/B testing) to test if two samples come from the same distribution. Or in other words, a procedure to test the null hypothesis that that two groups are not significantly different (e.g., a treatment and a control group). (#250)
- - Added
'leverage'
and'conviction
as evaluation metrics to thefrequent_patterns.association_rules
function. (#246 & #247)
- - Added a
loadings_
attribute toPrincipalComponentAnalysis
to compute the factor loadings of the features on the principal components. (#251)
- - Allow grid search over classifiers/regressors in ensemble and stacking estimators. (#259) -
- New
make_multiplexer_dataset
function that creates a dataset generated by a n-bit Boolean multiplexer for evaluating supervised learning algorithms. (#263)
- - Added a new
BootstrapOutOfBag
class, an implementation of the out-of-bag bootstrap to evaluate supervised learning algorithms. (#265)
- - The parameters for
StackingClassifier
,StackingCVClassifier
,StackingRegressor
,StackingCVRegressor
, andEnsembleVoteClassifier
can now be tuned using scikit-learn’sGridSearchCV
(#254 via James Bourbeau)
-
Changes
--
-
- The
'support'
column returned byfrequent_patterns.association_rules
was changed to compute the support of “antecedant union consequent”, and newantecedant support'
and'consequent support'
column were added to avoid ambiguity. (#245)
- - Allow the
OnehotTransactions
to be cloned via scikit-learn’sclone
function, which is required by e.g., scikit-learn’sFeatureUnion
orGridSearchCV
(via Iaroslav Shcherbatyi). (#249)
-
Bug Fixes
--
-
- Fix issues with
self._init_time
parameter in_IterativeModel
subclasses. (#256)
- - Fix imprecision bug that occurred in
plot_ecdf
when run on Python 2.7. (264)
- - The vectors from SVD in
PrincipalComponentAnalysis
are now being scaled so that the eigenvalues viasolver='eigen'
andsolver='svd'
now store eigenvalues that have the same magnitudes. (#251)
-
Version 0.8.0 (2017-09-09)
-Downloads
- -New Features
--
-
- Added a
mlxtend.evaluate.bootstrap
that implements the ordinary nonparametric bootstrap to bootstrap a single statistic (for example, the mean. median, R^2 of a regression fit, and so forth) #232
- SequentialFeatureSelecor
‘sk_features
now accepts a string argument “best” or “parsimonious” for more “automated” feature selection. For instance, if “best” is provided, the feature selector will return the feature subset with the best cross-validation performance. If “parsimonious” is provided as an argument, the smallest feature subset that is within one standard error of the cross-validation performance will be selected. #238
-
Changes
--
-
SequentialFeatureSelector
now usesnp.nanmean
over normal mean to support scorers that may returnnp.nan
#211 (via mrkaiser)
-- The
skip_if_stuck
parameter was removed fromSequentialFeatureSelector
in favor of a more efficient implementation comparing the conditional inclusion/exclusion results (in the floating versions) to the performances of previously sampled feature sets that were cached #237
- ExhaustiveFeatureSelector
was modified to consume substantially less memory #195 (via Adam Erickson)
-
Bug Fixes
--
-
- Fixed a bug where the
SequentialFeatureSelector
selected a feature subset larger than then specified via thek_features
tuple max-value #213
-
Version 0.7.0 (2017-06-22)
-Downloads
- -New Features
--
-
- New mlxtend.plotting.ecdf function for plotting empirical cumulative distribution functions (#196). -
- New
StackingCVRegressor
for stacking regressors with out-of-fold predictions to prevent overfitting (#201via Eike Dehling).
-
Changes
--
-
- The TensorFlow estimator have been removed from mlxtend, since TensorFlow has now very convenient ways to build on estimators, which render those implementations obsolete. -
plot_decision_regions
now supports plotting decision regions for more than 2 training features #189, via James Bourbeau).
-- Parallel execution in
mlxtend.feature_selection.SequentialFeatureSelector
andmlxtend.feature_selection.ExhaustiveFeatureSelector
is now performed over different feature subsets instead of the different cross-validation folds to better utilize machines with multiple processors if the number of features is large (#193, via @whalebot-helmsman).
- - Raise meaningful error messages if pandas
DataFrame
s or Python lists of lists are fed into theStackingCVClassifer
as afit
arguments (198).
- - The
n_folds
parameter of theStackingCVClassifier
was changed tocv
and can now accept any kind of cross validation technique that is available from scikit-learn. For example,StackingCVClassifier(..., cv=StratifiedKFold(n_splits=3))
orStackingCVClassifier(..., cv=GroupKFold(n_splits=3))
(#203, via Konstantinos Paliouras).
-
Bug Fixes
--
-
SequentialFeatureSelector
now correctly accepts aNone
argument for thescoring
parameter to infer the default scoring metric from scikit-learn classifiers and regressors (#171).
-- The
plot_decision_regions
function now supports pre-existing axes objects generated via matplotlib’splt.subplots
. (#184, see example)
- - Made
math.num_combinations
andmath.num_permutations
numerically stable for large numbers of combinations and permutations (#200).
-
Version 0.6.0 (2017-03-18)
-Downloads
- -New Features
--
-
- An
association_rules
function is implemented that allows to generate rules based on a list of frequent itemsets (via Joshua Goerner).
-
Changes
--
-
- Adds a black
edgecolor
to plots viaplotting.plot_decision_regions
to make markers more distinguishable from the background inmatplotlib>=2.0
.
- - The
association
submodule was renamed tofrequent_patterns
.
-
Bug Fixes
--
-
- The
DataFrame
index ofapriori
results are now unique and ordered.
- - Fixed typos in autompg and wine datasets (via James Bourbeau). -
Version 0.5.1 (2017-02-14)
-Downloads
- -New Features
--
-
- The
EnsembleVoteClassifier
has a newrefit
attribute that prevents refitting classifiers ifrefit=False
to save computational time.
- - Added a new
lift_score
function inevaluate
to compute lift score (via Batuhan Bardak).
- StackingClassifier
andStackingRegressor
support multivariate targets if the underlying models do (via kernc).
-StackingClassifier
has a newuse_features_in_secondary
attribute likeStackingCVClassifier
.
-
Changes
--
-
- Changed default verbosity level in
SequentialFeatureSelector
to 0
- - The
EnsembleVoteClassifier
now raises aNotFittedError
if the estimator wasn’tfit
before callingpredict
. (via Anton Loss)
- - Added new TensorFlow variable initialization syntax to guarantee compatibility with TensorFlow 1.0 -
Bug Fixes
--
-
- Fixed wrong default value for
k_features
inSequentialFeatureSelector
- - Cast selected feature subsets in the
SequentialFeautureSelector
as sets to prevent the iterator from getting stuck if thek_idx
are different permutations of the same combination (via Zac Wellmer).
- - Fixed an issue with learning curves that caused the performance metrics to be reversed (via ipashchenko) -
- Fixed a bug that could occur in the
SequentialFeatureSelector
if there are similarly-well performing subsets in the floating variants (via Zac Wellmer).
-
Version 0.5.0 (2016-11-09)
-Downloads
- -New Features
--
-
- New
ExhaustiveFeatureSelector
estimator inmlxtend.feature_selection
for evaluating all feature combinations in a specified range
- - The
StackingClassifier
has a new parameteraverage_probas
that is set toTrue
by default to maintain the current behavior. A deprecation warning was added though, and it will default toFalse
in future releases (0.6.0);average_probas=False
will result in stacking of the level-1 predicted probabilities rather than averaging these.
- - New
StackingCVClassifier
estimator in ‘mlxtend.classifier’ for implementing a stacking ensemble that uses cross-validation techniques for training the meta-estimator to avoid overfitting (Reiichiro Nakano)
- - New
OnehotTransactions
encoder class added to thepreprocessing
submodule for transforming transaction data into a one-hot encoded array
- - The
SequentialFeatureSelector
estimator inmlxtend.feature_selection
now is safely stoppable mid-process by control+c, and deprecated print_progress in favor of a more tunable verbose parameter (Will McGinnis)
- - New
apriori
function inassociation
to extract frequent itemsets from transaction data for association rule mining
- - New
checkerboard_plot
function inplotting
to plot checkerboard tables / heat maps
- - New
mcnemar_table
andmcnemar
functions inevaluate
to compute 2x2 contingency tables and McNemar’s test
-
Changes
--
-
- All plotting functions have been moved to
mlxtend.plotting
for compatibility reasons with continuous integration services and to make the installation ofmatplotlib
optional for users ofmlxtend
‘s core functionality
- - Added a compatibility layer for
scikit-learn 0.18
using the newmodel_selection
module while maintaining backwards compatibility to scikit-learn 0.17.
-
Bug Fixes
--
-
mlxtend.plotting.plot_decision_regions
now draws decision regions correctly if more than 4 class labels are present
-- Raise
AttributeError
inplot_decision_regions
when theX_higlight
argument is a 1D array (chkoar)
-
Version 0.4.2 (2016-08-24)
-Downloads
- -New Features
--
-
- Added
preprocessing.CopyTransformer
, a mock class that returns copies of
-imput arrays viatransform
andfit_transform
-
Changes
--
-
- Added AppVeyor to CI to ensure MS Windows compatibility -
- Dataset are now saved as compressed .txt or .csv files rather than being imported as Python objects -
feature_selection.SequentialFeatureSelector
now supports the selection ofk_features
using a tuple to specify a “min-max”k_features
range
-- Added “SVD solver” option to the
PrincipalComponentAnalysis
- - Raise a
AttributeError
with “not fitted” message inSequentialFeatureSelector
iftransform
orget_metric_dict
are called prior tofit
- - Use small, positive bias units in
TfMultiLayerPerceptron
‘s hidden layer(s) if the activations are ReLUs in order to avoid dead neurons
- - Added an optional
clone_estimator
parameter to theSequentialFeatureSelector
that defaults toTrue
, avoiding the modification of the original estimator objects
- - More rigorous type and shape checks in the
evaluate.plot_decision_regions
function
- DenseTransformer
now doesn’t raise and error if the input array is not sparse
-- API clean-up using scikit-learn’s
BaseEstimator
as parent class forfeature_selection.ColumnSelector
-
Bug Fixes
--
-
- Fixed a problem when a tuple-range was provided as argument to the
SequentialFeatureSelector
‘sk_features
parameter and the scoring metric was more negative than -1 (e.g., as in scikit-learn’s MSE scoring function) (wahutch](https://github.com/wahutch))
- - Fixed an
AttributeError
issue whenverbose
> 1 inStackingClassifier
- - Fixed a bug in
classifier.SoftmaxRegression
where the mean values of the offsets were used to update the bias units rather than their sum
- - Fixed rare bug in MLP
_layer_mapping
functions that caused a swap between the random number generation seed when initializing weights and biases
-
Version 0.4.1 (2016-05-01)
-Downloads
- -New Features
--
-
- New TensorFlow estimator for Linear Regression (
tf_regressor.TfLinearRegression
)
- - New k-means clustering estimator (
cluster.Kmeans
)
- - New TensorFlow k-means clustering estimator (
tf_cluster.Kmeans
)
-
Changes
--
-
- Due to refactoring of the estimator classes, the
init_weights
parameter of thefit
methods was globally renamed toinit_params
- - Overall performance improvements of estimators due to code clean-up and refactoring -
- Added several additional checks for correct array types and more meaningful exception messages -
- Added optional
dropout
to thetf_classifier.TfMultiLayerPerceptron
classifier for regularization
- - Added an optional
decay
parameter to thetf_classifier.TfMultiLayerPerceptron
classifier for adaptive learning via an exponential decay of the learning rate eta
- - Replaced old
NeuralNetMLP
by more streamlinedMultiLayerPerceptron
(classifier.MultiLayerPerceptron
); now also with softmax in the output layer and categorical cross-entropy loss.
- - Unified
init_params
parameter for fit functions to continue training where the algorithm left off (if supported)
-
Version 0.4.0 (2016-04-09)
-New Features
--
-
- New
TfSoftmaxRegression
classifier using Tensorflow (tf_classifier.TfSoftmaxRegression
)
- - New
SoftmaxRegression
classifier (classifier.SoftmaxRegression
)
- - New
TfMultiLayerPerceptron
classifier using Tensorflow (tf_classifier.TfMultiLayerPerceptron
)
- - New
StackingRegressor
(regressor.StackingRegressor
)
- - New
StackingClassifier
(classifier.StackingClassifier
)
- - New function for one-hot encoding of class labels (
preprocessing.one_hot
)
- - Added
GridSearch
support to theSequentialFeatureSelector
(feature_selection/.SequentialFeatureSelector
)
- evaluate.plot_decision_regions
improvements:-
-
- Function now handles class y-class labels correctly if array is of type
float
- - Correct handling of input arguments
markers
andcolors
- - Accept an existing
Axes
via theax
argument
-
-- Function now handles class y-class labels correctly if array is of type
- New
print_progress
parameter for all generalized models and multi-layer neural networks for printing time elapsed, ETA, and the current cost of the current epoch
- - Minibatch learning for
classifier.LogisticRegression
,classifier.Adaline
, andregressor.LinearRegression
plus streamlined API
- - New Principal Component Analysis class via
mlxtend.feature_extraction.PrincipalComponentAnalysis
- - New RBF Kernel Principal Component Analysis class via
mlxtend.feature_extraction.RBFKernelPCA
- - New Linear Discriminant Analysis class via
mlxtend.feature_extraction.LinearDiscriminantAnalysis
-
Changes
--
-
- The
column
parameter inmlxtend.preprocessing.standardize
now defaults toNone
to standardize all columns more conveniently
-
Version 0.3.0 (2016-01-31)
-Downloads
- -New Features
--
-
- Added a progress bar tracker to
classifier.NeuralNetMLP
- - Added a function to score predicted vs. target class labels
evaluate.scoring
- - Added confusion matrix functions to create (
evaluate.confusion_matrix
) and plot (evaluate.plot_confusion_matrix
) confusion matrices
- - New style parameter and improved axis scaling in
mlxtend.evaluate.plot_learning_curves
- - Added
loadlocal_mnist
tomlxtend.data
for streaming MNIST from a local byte files into numpy arrays
- - New
NeuralNetMLP
parameters:random_weights
,shuffle_init
,shuffle_epoch
- - New
SFS
features such as the generation of pandasDataFrame
results tables and plotting functions (with confidence intervals, standard deviation, and standard error bars)
- - Added support for regression estimators in
SFS
- - Added Boston
housing dataset
- - New
shuffle
parameter forclassifier.NeuralNetMLP
-
Changes
--
-
- The
mlxtend.preprocessing.standardize
function now optionally returns the parameters, which are estimated from the array, for re-use. A further improvement makes thestandardize
function smarter in order to avoid zero-division errors
- - Cosmetic improvements to the
evaluate.plot_decision_regions
function such as hiding plot axes
- - Renaming of
classifier.EnsembleClassfier
toclassifier.EnsembleVoteClassifier
- - Improved random weight initialization in
Perceptron
,Adaline
,LinearRegression
, andLogisticRegression
- - Changed
learning
parameter ofmlxtend.classifier.Adaline
tosolver
and added “normal equation” as closed-form solution solver
- - Hide y-axis labels in
mlxtend.evaluate.plot_decision_regions
in 1 dimensional evaluations
- - Sequential Feature Selection algorithms were unified into a single
SequentialFeatureSelector
class with parameters to enable floating selection and toggle between forward and backward selection.
- - Stratified sampling of MNIST (now 500x random samples from each of the 10 digit categories) -
- Renaming
mlxtend.plotting
tomlxtend.general_plotting
in order to distinguish general plotting function from specialized utility function such asevaluate.plot_decision_regions
-
Version 0.2.9 (2015-07-14)
-Downloads
- -New Features
--
-
- Sequential Feature Selection algorithms: SFS, SFFS, SBS, and SFBS -
Changes
--
-
- Changed
regularization
&lambda
parameters inLogisticRegression
to single parameterl2_lambda
-
Version 0.2.8 (2015-06-27)
--
-
- API changes:
-
-
mlxtend.sklearn.EnsembleClassifier
->mlxtend.classifier.EnsembleClassifier
-mlxtend.sklearn.ColumnSelector
->mlxtend.feature_selection.ColumnSelector
-mlxtend.sklearn.DenseTransformer
->mlxtend.preprocessing.DenseTransformer
-mlxtend.pandas.standardizing
->mlxtend.preprocessing.standardizing
-mlxtend.pandas.minmax_scaling
->mlxtend.preprocessing.minmax_scaling
-mlxtend.matplotlib
->mlxtend.plotting
-
- - Added momentum learning parameter (alpha coefficient) to
mlxtend.classifier.NeuralNetMLP
.
- - Added adaptive learning rate (decrease constant) to
mlxtend.classifier.NeuralNetMLP
.
- mlxtend.pandas.minmax_scaling
becamemlxtend.preprocessing.minmax_scaling
and also supports NumPy arrays now
-mlxtend.pandas.standardizing
becamemlxtend.preprocessing.standardizing
and now supports both NumPy arrays and pandas DataFrames; also, nowddof
parameters to set the degrees of freedom when calculating the standard deviation
-
Version 0.2.7 (2015-06-20)
--
-
- Added multilayer perceptron (feedforward artificial neural network) classifier as
mlxtend.classifier.NeuralNetMLP
.
- - Added 5000 labeled trainingsamples from the MNIST handwritten digits dataset to
mlxtend.data
-
Version 0.2.6 (2015-05-08)
--
-
- Added ordinary least square regression using different solvers (gradient and stochastic gradient descent, and the closed form solution (normal equation) -
- Added option for random weight initialization to logistic regression classifier and updated l2 regularization -
- Added
wine
dataset tomlxtend.data
- - Added
invert_axes
parametermlxtend.matplotlib.enrichtment_plot
to optionally plot the “Count” on the x-axis
- - New
verbose
parameter formlxtend.sklearn.EnsembleClassifier
by Alejandro C. Bahnsen
- - Added
mlxtend.pandas.standardizing
to standardize columns in a Pandas DataFrame
- - Added parameters
linestyles
andmarkers
tomlxtend.matplotlib.enrichment_plot
- mlxtend.regression.lin_regplot
automatically adds np.newaxis and works w. python lists
-- Added tokenizers:
mlxtend.text.extract_emoticons
andmlxtend.text.extract_words_and_emoticons
-
Version 0.2.5 (2015-04-17)
--
-
- Added Sequential Backward Selection (mlxtend.sklearn.SBS) -
- Added
X_highlight
parameter tomlxtend.evaluate.plot_decision_regions
for highlighting test data points.
- - Added mlxtend.regression.lin_regplot to plot the fitted line from linear regression. -
- Added mlxtend.matplotlib.stacked_barplot to conveniently produce stacked barplots using pandas
DataFrame
s.
- - Added mlxtend.matplotlib.enrichment_plot -
Version 0.2.4 (2015-03-15)
--
-
- Added
scoring
tomlxtend.evaluate.learning_curves
(by user pfsq)
- - Fixed setup.py bug caused by the missing README.html file -
- matplotlib.category_scatter for pandas DataFrames and Numpy arrays -
Version 0.2.3 (2015-03-11)
--
-
- Added Logistic regression -
- Gradient descent and stochastic gradient descent perceptron was changed
- to Adaline (Adaptive Linear Neuron)
- - Perceptron and Adaline for {0, 1} classes -
- Added
mlxtend.preprocessing.shuffle_arrays_unison
function to
- shuffle one or more NumPy arrays.
- - Added shuffle and random seed parameter to stochastic gradient descent classifier. -
- Added
rstrip
parameter tomlxtend.file_io.find_filegroups
to allow trimming of base names.
- - Added
ignore_substring
parameter tomlxtend.file_io.find_filegroups
andfind_files
.
- - Replaced .rstrip in
mlxtend.file_io.find_filegroups
with more robust regex.
- - Gridsearch support for
mlxtend.sklearn.EnsembleClassifier
-
Version 0.2.2 (2015-03-01)
--
-
- Improved robustness of EnsembleClassifier. -
- Extended plot_decision_regions() functionality for plotting 1D decision boundaries. -
- Function matplotlib.plot_decision_regions was reorganized to evaluate.plot_decision_regions . -
- evaluate.plot_learning_curves() function added. -
- Added Rosenblatt, gradient descent, and stochastic gradient descent perceptrons. -
Version 0.2.1 (2015-01-20)
--
-
- Added mlxtend.pandas.minmax_scaling - a function to rescale pandas DataFrame columns. -
- Slight update to the EnsembleClassifier interface (additional
voting
parameter)
- - Fixed EnsembleClassifier to return correct class labels if class labels are not
- integers from 0 to n.
- - Added new matplotlib function to plot decision regions of classifiers. -
Version 0.2.0 (2015-01-13)
--
-
- Improved mlxtend.text.generalize_duplcheck to remove duplicates and prevent endless looping issue. -
- Added
recursive
search parameter to mlxtend.file_io.find_files.
- - Added
check_ext
parameter mlxtend.file_io.find_files to search based on file extensions.
- - Default parameter to ignore invisible files for mlxtend.file_io.find. -
- Added
transform
andfit_transform
to theEnsembleClassifier
.
- - Added mlxtend.file_io.find_filegroups function. -
Version 0.1.9 (2015-01-10)
--
-
- Implemented scikit-learn EnsembleClassifier (majority voting rule) class. -
Version 0.1.8 (2015-01-07)
--
-
- Improvements to mlxtend.text.generalize_names to handle certain Dutch last name prefixes (van, van der, de, etc.). -
- Added mlxtend.text.generalize_name_duplcheck function to apply mlxtend.text.generalize_names function to a pandas DataFrame without creating duplicates. -
Version 0.1.7 (2015-01-07)
--
-
- Added text utilities with name generalization function. -
- Added and file_io utilities. -
Version 0.1.6 (2015-01-04)
--
-
- Added combinations and permutations estimators. -
Version 0.1.5 (2014-12-11)
--
-
- Added
DenseTransformer
for pipelines and grid search.
-
Version 0.1.4 (2014-08-20)
--
-
mean_centering
function is now a Class that createsMeanCenterer
objects
- that can be used to fit data via thefit
method, and center data at the column
- means via thetransform
andfit_transform
method.
-
Version 0.1.3 (2014-08-19)
--
-
- Added
preprocessing
module andmean_centering
function.
-
Version 0.1.2 (2014-08-19)
--
-
- Added
matplotlib
utilities andremove_borders
function.
-
Version 0.1.1 (2014-08-13)
--
-
- Simplified code for ColumnSelector. -