Releases: uxlfoundation/scikit-learn-intelex
Intel(R) Extension for Scikit-learn 2021.3
The release Intel(R) Extension for Scikit-learn 2021.3 introduces the following changes:
📚 Support Materials
- Medium blogs:
- Kaggle kernels:
- [Tabular Playground Series - Apr 2021] RF with Intel Extension for Scikit-learn
- [Tabular Playground Series - Apr 2021] SVM with Intel Extension for Scikit-learn
- [Tabular Playground Series - Apr 2021] SVM with Intel(R) Extension for Scikit-learn
- [Tabular Playground Series - Jun 2021] AutoGluon with Intel(R) Extension for Scikit-learn
- [Tabular Playground Series - Jun 2021] Fast LogReg with Intel(R) Extension for Scikit-learn
- [Tabular Playground Series - Jun 2021] Fast ML stack with Intel(R) Extension for Scikit-learn
- [Tabular Playground Series - Jun 2021] Fast Stacking with Intel(R) Extension for Scikit-learn
- Samples that illustrate the usage of Intel Extension for Scikit-learn
🛠️ Library Engineering
- Introduced optional dependencies on DPC++ runtime to Intel Extension for Scikit-learn and daal4py. To enable DPC++ backend, install dpcpp_cpp_rt package. It reduces the default package size with all dependencies from 1.2GB to 400 MB.
🚨 New Features
- Introduced the support of scikit-learn 1.0 version in Intel(R) Extension for Scikit-learn. The 2021.3 release of Intel(R) Extension for Scikit-learn supports the latest scikit-learn releases: 0.22.X, 0.23.X, 0.24.X and 1.0.X.
- The support of
patch_sklearn
for several algorithms: patch_sklearn(["SVC", "DBSCAN"]) - [CPU] Acceleration of
SVR
estimator - [CPU] Acceleration of
NuSVC
andNuSVR
estimators - [CPU]
Polynomial kernel
support in SVM algorithms
🚀 Improved performance
- [CPU]
SVM
algorithms training and prediction - [CPU]
Linear
,Ridge
,ElasticNet
, andLasso
regressions prediction
🐛 Bug Fixes
- Fixed binary incompatibility for the versions of numpy earlier than 1.19.4
- Fixed an issue with a very large number of trees (> 7000) for
Random Forest
algorithm - Fixed
patch_sklearn
to patch both fit and predict methods ofLogistic Regression
when the algorithm is given as a single parameter topatch_sklearn
- [CPU] Reduced the memory consumption of
SVM
prediction - [GPU] Fixed an issue with kernel compilation on the platforms without hardware FP64 support
❗ Known Issues
- Intel(R) Extension for Scikit-learn package installed from PyPI repository can’t be found on Debian systems (including Google Collab). Mitigation: add “site-packages” folder into Python packages searching before importing the packages:
import sys
import os
import site
sys.path.append(os.path.join(os.path.dirname(site.getsitepackages()[0]), "site-packages"))
Intel(R) Extension for Scikit-learn 2021.2.3
🚨 New Features
- Added support of patching scikit-learn version 1.0. scikit-learn version 0.21. * is no longer supported
Intel(R) Extension for Scikit-learn 2021.2
⚡️ New package - Intel(R) Extension for Scikit-learn*
- Intel(R) Extension for Scikit-learn* contains scikit-learn patching functionality originally available in daal4py package. All future updates for the patching will be available in Intel(R) Extension for Scikit-learn only. Please use the package instead of daal4py.
⚠️ Deprecations
- Scikit-learn patching functionality in daal4py was deprecated and moved to a separate package - Intel(R) Extension for Scikit-learn*. All future updates for the patching will be available in Intel(R) Extension for Scikit-learn only. Please use the package instead of daal4py for the Scikit-learn acceleration.
📚 Support Materials
- Medium blogs:
- Kaggle kernels:
🛠️ Library Engineering
- Enabled new PyPI distribution channel for Intel(R) Extension for Scikit-learn and daal4py:
- Four latest Python versions (3.6, 3.7, 3.8) are supported on Linux, Windows and MacOS.
- Support of both CPU and GPU is included in the package.
- You can download daal4py using the following command:
pip install daal4py
- You can download Intel(R) Extension for Scikit-learn using the following command:
pip install scikit-learn-intelex
🚨 New Features
- Patches for four latest scikit-learn releases: 0.21.X, 0.22.X, 0.23.X and 0.24.X
- [CPU] Acceleration of
roc_auc_score
function - [CPU] Bit-to-bit results reproducibility for: LinearRegression, Ridge, SVC, KMeans, PCA, Lasso, ElasticNet, tSNE, KNeighborsClassifier, KNeighborsRegressor, NearestNeighbors, RandomForestClassifier, RandomForestRegressor
🚀 Improved performance
- [CPU] RandomForestClassifier and RandomForestRegressor scikit-learn estimators: training and prediction
- [CPU] Principal Component Analysis (PCA) scikit-learn estimator: training
- [CPU] Support Vector Classification (SVC) scikit-learn estimators: training and prediction
- [CPU] Support Vector Classification (SVC) scikit-learn estimator with the
probability==True
parameter: training and prediction
🐛 Bug Fixes
- [CPU] Improved accuracy of
RandomForestClassifier
andRandomForestRegressor
scikit-learn estimators - [CPU] Fixed patching issues with
pairwise_distances
- [CPU] Fixed the behavior of the
patch_sklearn
andunpatch_sklearn
functions - [CPU] Fixed unexpected behavior that made accelerated functionality unavailable through scikit-learn patching if the input was not of
float32
orfloat64
data types. Scikit-learn patching now works with all numpy data types. - [CPU] Fixed a memory leak that appeared when
DataFrame
from pandas was used as an input type - [CPU] Fixed performance issue for interoperability with
Modin
Intel® daal4py 2020 Update 3 Patch 1
What's New
- Added support of patching scikit-learn version 0.24.
Intel® daal4py 2021.1
What's New
Introduced new daal4py functionality:
- GPU:
- Batch algorithms:
K-means
,Covariance, PCA
,Logistic Regression
,Linear Regression
,Random Forest Classification
andRegression
,Gradient Boosting Classification
andRegression
,kNN
,SVM
,DBSCAN
andLow-order moments
- Online algorithms:
Covariance
,PCA
,Linear Regression
andLow-order moments
- Batch algorithms:
Improved daal4py performance for the following algorithms:
- CPU:
Logistic Regression
training and predictionk-Nearest Neighbors
prediction withBrute Force
methodLogistic Loss
andCross Entropy objective functions
Introduced new functionality for scikit-learn patching through daal4py:
- CPU:
- Acceleration of
NearestNeighbors
andKNeighborsRegressor
scikit-learn estimators withBrute Force
andK-D tree
methods - Acceleration of
TSNE
scikit-learn estimator
- Acceleration of
- GPU:
- Intel GPU support in scikit-learn for
DBSCAN
,K-means
,Linear
andLogistic Regression
- Intel GPU support in scikit-learn for
Improved performance of the following scikit-learn estimators via scikit-learn patching:
- CPU:
LogisticRegression
fit, predict and predict_proba methodsKNeighborsClassifier
predict, predict_proba and kneighbors methods with“brute”
method
Known Issues
train_test_split
indaal4py
patches forScikit-learn
can produce incorrect shuffling on Windows*
Installation
To install this package with conda run the following:
conda install -c intel daal4py
Intel® daal4py 2020 Update 3
What's New in Intel® daal4py 2020 Update 3:
Introduced new daal4py functionality:
- Conversion of trained
XGBoost
* andLightGBM
* models into a daal4py Gradient Boosted Trees model for fast prediction - Support of
Modin
* DataFrame as an input - Brute Force method for
k-Nearest Neighbors
classification algorithm, which for datasets with more than 13 features demonstrates a better performance than the existing K-D tree method k-Nearest Neighbors
search for K-D tree and Brute Force methods with computation of distances to nearest neighbors and their indices
Extended existing daal4py functionality:
- Voting methods for prediction in
k-Nearest Neighbors
classification and search: based on inverse-distance and uniform weighting - New parameters in
Decision Forest
classification and regression: minObservationsInSplitNode, minWeightFractionInLeafNode, minImpurityDecreaseInSplitNode, maxLeafNodes with best-first strategy and sample weights - Support of Support Vector Machine (
SVM
) decision function for Multi-class Classifier
Improved daal4py performance for the following algorithms:
SVM
training and predictionDecision Forest
classification trainingRBF
andLinear
kernel functions
Introduced new functionality for scikit-learn patching through daal4py:
- Acceleration of
KNeighborsClassifier
scikit-learn estimator with Brute Force and K-D tree methods - Acceleration of
RandomForestClassifier
andRandomForestRegressor
scikit-learn estimators - Sparse input support for
KMeans
and Support Vector Classification (SVC
) scikit-learn estimators - Prediction of probabilities for
SVC
scikit-learn estimator - Support of ‘normalize’ parameter for
Lasso
andElasticNet
scikit-learn estimators
Improved performance of the following functionality for scikit-learn patching through daal4py:
train_test_split()
- Support Vector Classification (
SVC
) fit and prediction
To install this package with conda run the following:
conda install -c intel daal4py
daal4py 2020.2
Introduced new functionality:
- Thunder method for Support Vector Machine (SVM) training algorithm, which demonstrates better training time than the existing sequential minimal optimization method
Extended existing functionality:
- Training with the number of features greater than the number of observations for Linear Regression, Ridge Regression, and Principal Component Analysis
- New sample_weights parameter for SVM algorithm
- New parameter in K-Means algorithm, resultsToEvaluate, which controls computation of centroids, assignments, and exact objective function
Improved performance for the following:
- Support Vector Machine training and prediction, Elastic Net and LASSO training, Principal Component Analysis training and transform, K-D tree based k-Nearest Neighbors prediction
- K-Means algorithm in batch computation mode
- RBF kernel function
Deprecated 32-bit support:
- 2020 product line will be the last one to support 32-bit
Introduced improvements to daal4py library:
- Performance optimizations for pandas input format
- Scikit-learn compatible API for AdaBoost classifier, Decision Tree classifier, and Gradient Boosted Trees classifier and regressor
Improved performance of the following Intel Scikit-learn algorithms and functions:
- fit and prediction in K-Means and Support Vector Classification (SVC), fit in Elastic Net and LASSO, fit and transform in PCA
- Support Vector Classification (SVC) with non-default weights of samples and classes
- train_test_split() and assert_all_finite()
To install this package with conda run the following:
conda install -c intel daal4py
daal4py 2020.1
Introduced new functionality:
- Elastic Net algorithm with L1 and L2 regularization in batch computation mode. The algorithm supports various optimization solvers that handle non-smooth functions.
- Probabilistic classification for Decision Forest Classification algorithm with a choice voting method to calculate probabilities.
Extended existing functionality:
- Performance optimizations for distributed Spark samples, K-means algorithm for some input dimensions, Gradient Boosted Trees training stage for large datasets on multi-core platforms and Decision Forest prediction stage for datasets with a small number of observations on processors that support Intel® Advanced Vector Extensions 2 (Intel® AVX2) and Intel® Advanced Vector Extensions 512 (Intel® AVX-512)
- Performance optimizations across algorithms that use SOA (Structure Of Arrays) NumericTable as an input on processors that support Intel® Advanced Vector Extensions 512 (Intel® AVX-512)
daal4py 2020.0
Added support for Brownboost, Logistboost as well as Stump regression and Stump classification algorithms to daal4py.
Added support for Adaboost classification algorithm, including support for method="SAMME" or "SAMMER" for multi-class data.
"Variable Importance" feature has been added in Gradient Boosting Trees.
Ability to compute class prediction probabilities has been added to appropriate classifiers, including logistic regression, tree-based classifiers, etc.
2019.5
Single node support for DBSCAN, LASSO, Coordinate Descent (CD) solver algorithms
Distributed model support for SVD, QR, K-means init++ and parallel++ algorithms