A suite of visual analysis and diagnostic tools to facilitate feature selection, model selection, and parameter tuning for machine learning.
Image by Quatro Cinco, used with permission, Flickr Creative Commons.
Yellowbrick is a suite of visual analysis and diagnostic tools to facilitate feature selection, model selection, and parameter tuning for machine learning. All visualizations are generated in Matplotlib. Custom yellowbrick
visualization tools include:
- Boxplots (box-and-whisker plots)
- Violinplots
- Histograms
- Scatter plot matrices (sploms)
- Radial visualizations (radviz)
- Parallel coordinates
- Jointplots
- Rank 1D
- Rank 2D
- ROC-AUC curves
- Classification heatmaps
- Class balance chart
- Prediction error plots
- Residual plots
- Most informative features
- Silhouettes
- Density measures
- Validation curves
- Gridsearch heatmaps
The Yellowbrick API is specifically designed to play nicely with Scikit-Learn. Here is an example of a typical workflow sequence with Scikit-Learn and Yellowbrick:
In this example, we see how Rank2D performs pairwise comparisons of each feature in the data set with a specific metric or algorithm, then returns them ranked as a lower left triangle diagram.
from yellowbrick.features import Rank2D
visualizer = Rank2D(features=features, algorithm='covariance')
visualizer.fit(X, y) # Fit the data to the visualizer
visualizer.transform(X) # Transform the data
visualizer.poof() # Draw/show/poof the data
In this example, we instantiate a Scikit-Learn classifier, and then we use Yellowbrick's ROCAUC class to visualize the tradeoff between the classifier's sensitivity and specificity.
from sklearn.svm import LinearSVC
from yellowbrick import ROCAUC
model = LinearSVC()
model.fit(X,y)
y_pred = model.predict(X)
visualizer = ROCAUC(model)
visualizer.score(y,y_pred)
visualizer.poof()
For additional information on getting started with Yellowbrick, check out our examples notebook.
We also have a quick start guide.
Yellowbrick is an open source tool designed to enable more informed machine learning through visualizations. If you would like to contribute, you can do so in the following ways:
- Add issues or bugs to the bug tracker: https://github.com/DistrictDataLabs/yellowbrick/issues
- Work on a card on the dev board: https://waffle.io/DistrictDataLabs/yellowbrick
- Create a pull request in Github: https://github.com/DistrictDataLabs/yellowbrick/pulls
This repository is set up in a typical production/release/development cycle as described in A Successful Git Branching Model. A typical workflow is as follows:
-
Select a card from the dev board - preferably one that is "ready" then move it to "in-progress".
-
Create a branch off of develop called "feature-[feature name]", work and commit into that branch.
~$ git checkout -b feature-myfeature develop
-
Once you are done working (and everything is tested) merge your feature into develop.
~$ git checkout develop ~$ git merge --no-ff feature-myfeature ~$ git branch -d feature-myfeature ~$ git push origin develop
-
Repeat. Releases will be routinely pushed into master via release branches, then deployed to the server.