Skip to content

Commit

Permalink
Merge branch 'release-0.4'
Browse files Browse the repository at this point in the history
  • Loading branch information
bbengfort committed May 4, 2017
2 parents 84e6ad1 + 097700e commit ea0729f
Show file tree
Hide file tree
Showing 119 changed files with 59,072 additions and 1,774 deletions.
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -121,3 +121,5 @@ fabric.properties
# modules.xml
# .idea/misc.xml
# *.ipr

.idea
275 changes: 275 additions & 0 deletions CONTRIBUTING.md

Large diffs are not rendered by default.

15 changes: 10 additions & 5 deletions DESCRIPTION.txt
Original file line number Diff line number Diff line change
@@ -1,15 +1,20 @@
Yellowbrick is a suite of visual analysis and diagnostic tools designed to facilitate machine learning with Scikit-Learn. The package includes visualizations that can help users navigate the feature selection process, build intuition around model selection, diagnose common problems like bias, heteroscedasticity, underfit, and overtraining, and support hyperparameter tuning to steer predictive models toward more successful results.
Yellowbrick is a suite of visual analysis and diagnostic tools designed to facilitate machine learning with Scikit-Learn. The library implements a new core API object, the "Visualizer" that is an Scikit-Learn estimator: an object that learns from data. Like transformers or models, visualizers learn from data by creating a visual representation of the model selection workflow.

Visualizers allow users to steer the model selection process, building intuition around feature engineering, algorithm selection, and hyperparameter tuning. For example, visualizers can help diagnose common problems surrounding model complexity and bias, heteroscedasticity, underfit and overtraining, or class balance issues. By applying visualizers to the model selection workflow, Yellowbrick allows you to steer predictive models to more successful results, faster.

Some of the available tools include:

- histograms
- scatter plot matrices
- pairwise feature ranking
- parallel coordinates
- jointplots
- radial visualization
- ROC curves
- classification heatmaps
- residual plots
- prediction error plots
- alpha selection plots
- validation curves
- gridsearch heatmaps
- text frequency distributions
- tsne corpus visualization

For more, please see the full documentation at: http://yellowbrick.readthedocs.org/en/latest/
And much more! Please see the full documentation at: http://scikit-yb.org/
27 changes: 27 additions & 0 deletions MAINTAINERS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Maintainers and Contributors

This file describes how the Yellowbrick project is maintained and provides contact information for key folks in the project.

When creating a pull request, your contribution will be reviewed by one or probably two maintainers who will give you the :+1: when your extension is ready to be merged. Maintainers work hard to ensure that Yellowbrick is a high quality project and that contributors are successful.

For more about how to develop visualizers and contribute features to Yellowbrick, see our [contributor's guide](CONTRIBUTING.md) and the documentation.

## Maintainers

This is a list of the primary project maintainers. Feel free to @ message them in issues and converse with them directly.

- [bbengfort](https://github.com/bbengfort)
- [NealHumphrey](https://github.com/NealHumphrey)
- [jkeung](https://github.com/jkeung)

## Core Contributors

This is a list of the core-contributors of the project. Core contributors set the road map and vision of the project. Keep an eye out for them in issues and check out their work to use as inspiration! Most likely they would also be happy to chat and answer questions.

- [rebeccabilbro](https://github.com/rebeccabilbro)
- [mattandahalfew](https://github.com/mattandahalfew)
- [pdamodaran](https://github.com/pdamodaran)
- [ndanielsen](https://github.com/ndanielsen)
- [tuulihill](https://github.com/tuulihill)
- [balavenkatesan](https://github.com/balavenkatesan)
- [morganmendis](https://github.com/morganmendis)
114 changes: 62 additions & 52 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,50 +7,62 @@
[![Stories in Ready](https://badge.waffle.io/DistrictDataLabs/yellowbrick.png?label=ready&title=Ready)](https://waffle.io/DistrictDataLabs/yellowbrick)


A suite of visual analysis and diagnostic tools to facilitate feature selection, model selection, and parameter tuning for machine learning.

**Visual analysis and diagnostic tools to facilitate machine learning model selection.**

![Follow the yellow brick road](docs/images/yellowbrickroad.jpg)
Image by [Quatro Cinco](https://flic.kr/p/2Yj9mj), used with permission, Flickr Creative Commons.

# What is Yellowbrick?
Yellowbrick is a suite of visual analysis and diagnostic tools to facilitate feature selection, model selection, and parameter tuning for machine learning. All visualizations are generated in Matplotlib. Custom `yellowbrick` visualization tools include:

## Tools for feature analysis and selection
- Boxplots (box-and-whisker plots)
- Violinplots
- Histograms
- Scatter plot matrices (sploms)
- Radial visualizations (radviz)
- Parallel coordinates
- Jointplots
- Rank 1D
- Rank 2D

## Tools for model evaluation
### Classification
- ROC-AUC curves
- Classification heatmaps
- Class balance chart

### Regression
- Prediction error plots
- Residual plots
- Most informative features

### Clustering
- Silhouettes
- Density measures

## Tools for parameter tuning
- Validation curves
- Gridsearch heatmaps
This README is a guide for developers, if you're new to Yellowbrick, get started at our [documentation](http://www.scikit-yb.org/).

## What is Yellowbrick?

Yellowbrick is a suite of visual diagnostic tools called "Visualizers" that extend the Scikit-Learn API to allow human steering of the model selection process. In a nutshell, Yellowbrick combines Scikit-Learn with Matplotlib in the best tradition of the Scikit-Learn documentation, but to produce visualizations for _your_ models!

![Visualizers](docs/images/visualizers.png)

### Visualizers

Visualizers are estimators (objects that learn from data) whose primary objective is to create visualizations that allow insight into the model selection process. In Scikit-Learn terms, they can be similar to transformers when visualizing the data space or wrap an model estimator similar to how the "ModelCV" (e.g. RidgeCV, LassoCV) methods work. The primary goal of Yellowbrick is to create a sensical API similar to Scikit-Learn. Some of our most popular visualizers include:

#### Feature Visualization

- Rank2D: pairwise ranking of features to detect relationships
- Parallel Coordinates: horizontal visualization of instances
- Radial Visualization: separation of instances around a circular plot

#### Classification Visualization

- Class Balance: see how the distribution of classes affects the model
- Classification Report: visual representation of precision, recall, and F1
- ROC/AUC Curves: receiver operator characteristics and area under the curve
- Confusion Matrices: visual description of class decision making

#### Regression Visualization

- Prediction Error Plots: find model breakdowns along the domain of the target
- Residuals Plot: show the difference in residuals of training and test data
- Alpha Selection: show how the choice of alpha influences regularization

#### Clustering Visualization

- K-Elbow Plot: select k using the elbow method and various metrics
- Silhouette Plot: select k by visualizing silhouette coefficient values

#### Text Visualization

- Term Frequency: visualize the frequency distribution of terms in the corpus
- TSNE: use stochastic neighbor embedding to project documents.

And more! Visualizers are being added all the time, be sure to check the examples (or even the develop branch) and feel free to contribute your ideas for Visualizers!

## Using Yellowbrick

The Yellowbrick API is specifically designed to play nicely with Scikit-Learn. Here is an example of a typical workflow sequence with Scikit-Learn and Yellowbrick:

### Feature Visualization

In this example, we see how Rank2D performs pairwise comparisons of each feature in the data set with a specific metric or algorithm, then returns them ranked as a lower left triangle diagram.

```python
from yellowbrick.features import Rank2D

Expand All @@ -61,7 +73,9 @@ visualizer.poof() # Draw/show/poof the data
```

### Model Visualization

In this example, we instantiate a Scikit-Learn classifier, and then we use Yellowbrick's ROCAUC class to visualize the tradeoff between the classifier's sensitivity and specificity.

```python
from sklearn.svm import LinearSVC
from yellowbrick.classifier import ROCAUC
Expand All @@ -79,26 +93,22 @@ We also have a [quick start guide](https://github.com/DistrictDataLabs/yellowbri

## Contributing to Yellowbrick

Yellowbrick is an open source tool designed to enable more informed machine learning through visualizations. If you would like to contribute, you can do so in the following ways:
Yellowbrick is an open source project that is supported by a community who will gratefully and humbly accept any contributions you might make to the project. Large or small, any contribution makes a big difference; and if you've never contributed to an open source project before, we hope you will start with Yellowbrick!

- Add issues or bugs to the bug tracker: https://github.com/DistrictDataLabs/yellowbrick/issues
- Work on a card on the dev board: https://waffle.io/DistrictDataLabs/yellowbrick
- Create a pull request in Github: https://github.com/DistrictDataLabs/yellowbrick/pulls
Principally, Yellowbrick development is about the addition and creation of *visualizers* --- objects that learn from data and create a visual representation of the data or model. Visualizers integrate with Scikit-Learn estimators, transformers, and pipelines for specific purposes and as a result, can be simple to build and deploy. The most common contribution is therefore a new visualizer for a specific model or model family. We'll discuss in detail how to build visualizers later.

This repository is set up in a typical production/release/development cycle as described in [A Successful Git Branching Model](http://nvie.com/posts/a-successful-git-branching-model/). A typical workflow is as follows:
Beyond creating visualizers, there are many ways to contribute:

1. Select a card from the [dev board](https://waffle.io/districtdatalabs/yellowbrick) - preferably one that is "ready" then move it to "in-progress".
2. Create a branch off of develop called "feature-[feature name]", work and commit into that branch.
```
~$ git checkout -b feature-myfeature develop
```
- Submit a bug report or feature request on [GitHub Issues](https://github.com/DistrictDataLabs/yellowbrick/issues).
- Contribute a Jupyter notebook to our examples[ gallery](https://github.com/DistrictDataLabs/yellowbrick/tree/develop/examples).
- Assist us with [user testing](http://www.scikit-yb.org/en/latest/evaluation.html).
- Add to the documentation or help with our website, [scikit-yb.org](http://www.scikit-yb.org).
- Write unit or integration tests for our project.
- Answer questions on our issues, mailing list, Stack Overflow, and elsewhere.
- Translate our documentation into another language.
- Write a blog post, tweet, or share our project with others.
- Teach someone how to use Yellowbrick.

3. Once you are done working (and everything is tested) merge your feature into develop.
```
~$ git checkout develop
~$ git merge --no-ff feature-myfeature
~$ git branch -d feature-myfeature
~$ git push origin develop
```
As you can see, there are lots of ways to get involved and we would be very happy for you to join us! The only thing we ask is that you abide by the principles of openness, respect, and consideration of others as described in the [Python Software Foundation Code of Conduct](https://www.python.org/psf/codeofconduct/).

4. Repeat. Releases will be routinely pushed into master via release branches, then deployed to the server.
For more information, checkout [CONTRIBUTING.md](https://github.com/DistrictDataLabs/yellowbrick/blob/develop/CONTRIBUTING.md).
13 changes: 13 additions & 0 deletions docs/_static/theme_overrides.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
/* override table width restrictions */
@media screen and (min-width: 767px) {

.wy-table-responsive table td {
/* !important prevents the common CSS stylesheets from overriding
this as on RTD they are loaded after this stylesheet */
white-space: normal !important;
}

.wy-table-responsive {
overflow: visible !important;
}
}
Loading

0 comments on commit ea0729f

Please sign in to comment.