Skip to content

Releases: uxlfoundation/scikit-learn-intelex

Intel(R) Extension for Scikit-learn 2021.3

05 Jul 10:16
fb0972c
Compare
Choose a tag to compare

The release Intel(R) Extension for Scikit-learn 2021.3 introduces the following changes:

📚 Support Materials

🛠️ Library Engineering

  • Introduced optional dependencies on DPC++ runtime to Intel Extension for Scikit-learn and daal4py. To enable DPC++ backend, install dpcpp_cpp_rt package. It reduces the default package size with all dependencies from 1.2GB to 400 MB.

🚨 New Features

  • Introduced the support of scikit-learn 1.0 version in Intel(R) Extension for Scikit-learn. The 2021.3 release of Intel(R) Extension for Scikit-learn supports the latest scikit-learn releases: 0.22.X, 0.23.X, 0.24.X and 1.0.X.
  • The support of patch_sklearn for several algorithms: patch_sklearn(["SVC", "DBSCAN"])
  • [CPU] Acceleration of SVR estimator
  • [CPU] Acceleration of NuSVC and NuSVR estimators
  • [CPU] Polynomial kernel support in SVM algorithms

🚀 ​Improved performance

  • [CPU] SVM algorithms training and prediction
  • [CPU] Linear, Ridge, ElasticNet, and Lasso regressions prediction

🐛 Bug Fixes

  • Fixed binary incompatibility for the versions of numpy earlier than 1.19.4
  • Fixed an issue with a very large number of trees (> 7000) for Random Forest algorithm
  • Fixed patch_sklearn to patch both fit and predict methods of Logistic Regression when the algorithm is given as a single parameter to patch_sklearn
  • [CPU] Reduced the memory consumption of SVM prediction
  • [GPU] Fixed an issue with kernel compilation on the platforms without hardware FP64 support

❗ Known Issues

  • Intel(R) Extension for Scikit-learn package installed from PyPI repository can’t be found on Debian systems (including Google Collab). Mitigation: add “site-packages” folder into Python packages searching before importing the packages:
import sys 
import os 
import site 
sys.path.append(os.path.join(os.path.dirname(site.getsitepackages()[0]), "site-packages")) 

Intel(R) Extension for Scikit-learn 2021.2.3

27 May 21:31
a78649b
Compare
Choose a tag to compare

🚨 New Features

  • Added support of patching scikit-learn version 1.0. scikit-learn version 0.21. * is no longer supported

Intel(R) Extension for Scikit-learn 2021.2

30 Mar 20:46
24cd6bf
Compare
Choose a tag to compare

⚡️ New package - Intel(R) Extension for Scikit-learn*

  • Intel(R) Extension for Scikit-learn* contains scikit-learn patching functionality originally available in daal4py package. All future updates for the patching will be available in Intel(R) Extension for Scikit-learn only. Please use the package instead of daal4py.

⚠️ Deprecations

  • Scikit-learn patching functionality in daal4py was deprecated and moved to a separate package - Intel(R) Extension for Scikit-learn*. All future updates for the patching will be available in Intel(R) Extension for Scikit-learn only. Please use the package instead of daal4py for the Scikit-learn acceleration.

📚 Support Materials

🛠️ Library Engineering

  • Enabled new PyPI distribution channel for Intel(R) Extension for Scikit-learn and daal4py:
    • Four latest Python versions (3.6, 3.7, 3.8) are supported on Linux, Windows and MacOS.
    • Support of both CPU and GPU is included in the package.
    • You can download daal4py using the following command: pip install daal4py
    • You can download Intel(R) Extension for Scikit-learn using the following command: pip install scikit-learn-intelex

🚨 New Features

  • Patches for four latest scikit-learn releases: 0.21.X, 0.22.X, 0.23.X and 0.24.X
  • [CPU] Acceleration of roc_auc_score function
  • [CPU] Bit-to-bit results reproducibility for: LinearRegression, Ridge, SVC, KMeans, PCA, Lasso, ElasticNet, tSNE, KNeighborsClassifier, KNeighborsRegressor, NearestNeighbors, RandomForestClassifier, RandomForestRegressor

🚀 ​Improved performance

  • [CPU] RandomForestClassifier and RandomForestRegressor scikit-learn estimators: training and prediction
  • [CPU] Principal Component Analysis (PCA) scikit-learn estimator: training
  • [CPU] Support Vector Classification (SVC) scikit-learn estimators: training and prediction
  • [CPU] Support Vector Classification (SVC) scikit-learn estimator with the probability==True parameter: training and prediction

🐛 Bug Fixes

  • [CPU] Improved accuracy of RandomForestClassifier and RandomForestRegressor scikit-learn estimators
  • [CPU] Fixed patching issues with pairwise_distances
  • [CPU] Fixed the behavior of the patch_sklearn and unpatch_sklearn functions
  • [CPU] Fixed unexpected behavior that made accelerated functionality unavailable through scikit-learn patching if the input was not of float32 or float64 data types. Scikit-learn patching now works with all numpy data types.
  • [CPU] Fixed a memory leak that appeared when DataFrame from pandas was used as an input type
  • [CPU] Fixed performance issue for interoperability with Modin

Intel® daal4py 2020 Update 3 Patch 1

25 Dec 12:19
df1d2a7
Compare
Choose a tag to compare
Pre-release

What's New

  • Added support of patching scikit-learn version 0.24.

Intel® daal4py 2021.1

14 Dec 12:00
ed41c1a
Compare
Choose a tag to compare

What's New

Introduced new daal4py functionality:

  • GPU:
    • Batch algorithms: K-means, Covariance, PCA, Logistic Regression, Linear Regression, Random Forest Classification and Regression, Gradient Boosting Classification and Regression, kNN, SVM, DBSCAN and Low-order moments
    • Online algorithms: Covariance, PCA, Linear Regression and Low-order moments

Improved daal4py performance for the following algorithms:

  • CPU:
    • Logistic Regression training and prediction
    • k-Nearest Neighbors prediction with Brute Force method
    • Logistic Loss and Cross Entropy objective functions

Introduced new functionality for scikit-learn patching through daal4py:

  • CPU:
    • Acceleration of NearestNeighbors and KNeighborsRegressor scikit-learn estimators with Brute Force and K-D tree methods
    • Acceleration of TSNE scikit-learn estimator
  • GPU:
    • Intel GPU support in scikit-learn for DBSCAN, K-means, Linear and Logistic Regression

Improved performance of the following scikit-learn estimators via scikit-learn patching:

  • CPU:
    • LogisticRegression fit, predict and predict_proba methods
    • KNeighborsClassifier predict, predict_proba and kneighbors methods with “brute” method

Known Issues

  • train_test_split in daal4py patches for Scikit-learn can produce incorrect shuffling on Windows*

Installation

To install this package with conda run the following:

conda install -c intel daal4py

Intel® daal4py 2020 Update 3

06 Nov 11:25
ed3ccf8
Compare
Choose a tag to compare

What's New in Intel® daal4py 2020 Update 3:

Introduced new daal4py functionality:

  • Conversion of trained XGBoost* and LightGBM* models into a daal4py Gradient Boosted Trees model for fast prediction
  • Support of Modin* DataFrame as an input
  • Brute Force method for k-Nearest Neighbors classification algorithm, which for datasets with more than 13 features demonstrates a better performance than the existing K-D tree method
  • k-Nearest Neighbors search for K-D tree and Brute Force methods with computation of distances to nearest neighbors and their indices

Extended existing daal4py functionality:

  • Voting methods for prediction in k-Nearest Neighbors classification and search: based on inverse-distance and uniform weighting
  • New parameters in Decision Forest classification and regression: minObservationsInSplitNode, minWeightFractionInLeafNode, minImpurityDecreaseInSplitNode, maxLeafNodes with best-first strategy and sample weights
  • Support of Support Vector Machine (SVM) decision function for Multi-class Classifier

Improved daal4py performance for the following algorithms:

  • SVM training and prediction
  • Decision Forest classification training
  • RBF and Linear kernel functions

Introduced new functionality for scikit-learn patching through daal4py:

  • Acceleration of KNeighborsClassifier scikit-learn estimator with Brute Force and K-D tree methods
  • Acceleration of RandomForestClassifier and RandomForestRegressor scikit-learn estimators
  • Sparse input support for KMeans and Support Vector Classification (SVC) scikit-learn estimators
  • Prediction of probabilities for SVC scikit-learn estimator
  • Support of ‘normalize’ parameter for Lasso and ElasticNet scikit-learn estimators

Improved performance of the following functionality for scikit-learn patching through daal4py:

  • train_test_split()
  • Support Vector Classification (SVC) fit and prediction

To install this package with conda run the following:
conda install -c intel daal4py

daal4py 2020.2

17 Aug 08:55
77ff4f6
Compare
Choose a tag to compare

Introduced new functionality:

  • Thunder method for Support Vector Machine (SVM) training algorithm, which demonstrates better training time than the existing sequential minimal optimization method

Extended existing functionality:

  • Training with the number of features greater than the number of observations for Linear Regression, Ridge Regression, and Principal Component Analysis
  • New sample_weights parameter for SVM algorithm
  • New parameter in K-Means algorithm, resultsToEvaluate, which controls computation of centroids, assignments, and exact objective function

Improved performance for the following:

  • Support Vector Machine training and prediction, Elastic Net and LASSO training, Principal Component Analysis training and transform, K-D tree based k-Nearest Neighbors prediction
  • K-Means algorithm in batch computation mode
  • RBF kernel function

Deprecated 32-bit support:

  • 2020 product line will be the last one to support 32-bit

Introduced improvements to daal4py library:

  • Performance optimizations for pandas input format
  • Scikit-learn compatible API for AdaBoost classifier, Decision Tree classifier, and Gradient Boosted Trees classifier and regressor

Improved performance of the following Intel Scikit-learn algorithms and functions:

  • fit and prediction in K-Means and Support Vector Classification (SVC), fit in Elastic Net and LASSO, fit and transform in PCA
  • Support Vector Classification (SVC) with non-default weights of samples and classes
  • train_test_split() and assert_all_finite()

To install this package with conda run the following:
conda install -c intel daal4py

daal4py 2020.1

17 Aug 08:51
77ff4f6
Compare
Choose a tag to compare

Introduced new functionality:

  • Elastic Net algorithm with L1 and L2 regularization in batch computation mode. The algorithm supports various optimization solvers that handle non-smooth functions.
  • Probabilistic classification for Decision Forest Classification algorithm with a choice voting method to calculate probabilities.

Extended existing functionality:

  • Performance optimizations for distributed Spark samples, K-means algorithm for some input dimensions, Gradient Boosted Trees training stage for large datasets on multi-core platforms and Decision Forest prediction stage for datasets with a small number of observations on processors that support Intel® Advanced Vector Extensions 2 (Intel® AVX2) and Intel® Advanced Vector Extensions 512 (Intel® AVX-512)
  • Performance optimizations across algorithms that use SOA (Structure Of Arrays) NumericTable as an input on processors that support Intel® Advanced Vector Extensions 512 (Intel® AVX-512)

daal4py 2020.0

19 Dec 13:27
de475aa
Compare
Choose a tag to compare

Added support for Brownboost, Logistboost as well as Stump regression and Stump classification algorithms to daal4py.
Added support for Adaboost classification algorithm, including support for method="SAMME" or "SAMMER" for multi-class data.
"Variable Importance" feature has been added in Gradient Boosting Trees.
Ability to compute class prediction probabilities has been added to appropriate classifiers, including logistic regression, tree-based classifiers, etc.

2019.5

05 Oct 20:02
f66d308
Compare
Choose a tag to compare

Single node support for DBSCAN, LASSO, Coordinate Descent (CD) solver algorithms
Distributed model support for SVD, QR, K-means init++ and parallel++ algorithms