Skip to content

Commit

Permalink
Merge branch 'release-0.5'
Browse files Browse the repository at this point in the history
  • Loading branch information
bbengfort committed Aug 9, 2017
2 parents 09c8aea + 38edcca commit 81594c1
Show file tree
Hide file tree
Showing 460 changed files with 12,554 additions and 3,178 deletions.
6 changes: 6 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -123,3 +123,9 @@ fabric.properties
# *.ipr

.idea

# VisualTestCase Outputs
/tests/actual_images/*

# Data downloaded from Yellowbrick
data/
23 changes: 14 additions & 9 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -95,14 +95,13 @@ The Yellowbrick repository is set up in a typical production/release/development

You can work directly in your fork and create a pull request from your fork's develop branch into ours. We also recommend setting up an `upstream` remote so that you can easily pull the latest development changes from the main Yellowbrick repository (see [configuring a remote for a fork](https://help.github.com/articles/configuring-a-remote-for-a-fork/)). You can do that as follows:

``
$ git remote add upstream https://github.com/DistrictDataLabs/yellowbrick.git
$ git remote -v
origin https://github.com/YOUR_USERNAME/YOUR_FORK.git (fetch)
origin https://github.com/YOUR_USERNAME/YOUR_FORK.git (push)
upstream https://github.com/DistrictDataLabs/yellowbrick.git (fetch)
upstream https://github.com/DistrictDataLabs/yellowbrick.git (push)
``
`$ git remote add upstream https://github.com/DistrictDataLabs/yellowbrick.git`
`$ git remote -v`
> origin https://github.com/YOUR_USERNAME/YOUR_FORK.git (fetch)
> origin https://github.com/YOUR_USERNAME/YOUR_FORK.git (push)
> upstream https://github.com/DistrictDataLabs/yellowbrick.git (fetch)
> upstream https://github.com/DistrictDataLabs/yellowbrick.git (push)

When you're ready, request a code review for your pull request. Then, when reviewed and approved, you can merge your fork into our main branch. Make sure to use the "Squash and Merge" option in order to create a Git history that is understandable.

Expand Down Expand Up @@ -216,12 +215,18 @@ class MyVisualizerTests(VisualTestCase, DatasetMixin):
self.fail("my visualizer didn't work")
```

Tests can be run as follows::
The entire test suite can be run as follows::

```
$ make test
```

You can also run your own test file as follows::

```
$ nosetests tests/test_your_visualizer.py
```

The Makefile uses the nosetest runner and testing suite as well as the coverage library, so make sure you have those dependencies installed! The `DatasetMixin` also requires requests.py to fetch data from our Amazon S3 account.

### Documentation
Expand Down
2 changes: 1 addition & 1 deletion DESCRIPTION.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

.. |Visualizers| image:: http://www.scikit-yb.org/en/latest/_images/visualizers.png
:width: 800 px
.. _Visualizers: http://scikit-yb.org/
.. _Visualizers: http://www.scikit-yb.org/

Yellowbrick
===========
Expand Down
29 changes: 27 additions & 2 deletions docs/about.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,15 @@
=====
About
=====

Yellowbrick is an open source, pure Python project that extends Scikit-Learn with visual analysis and diagnostic tools. The Yellowbrick API also wraps Matplotlib to create publication-ready figures and interactive data explorations while still allowing developers fine-grain control of figures. For users, Yellowbrick can help evaluate the performance, stability, and predictive value of machine learning models, and assist in diagnosing problems throughout the machine learning workflow.
.. image:: images/yellowbrickroad.jpg

Image by QuatroCinco_, used with permission, Flickr Creative Commons.

Yellowbrick is an open source, pure Python project that extends the Scikit-Learn API_ with visual analysis and diagnostic tools. The Yellowbrick API also wraps Matplotlib to create publication-ready figures and interactive data explorations while still allowing developers fine-grain control of figures. For users, Yellowbrick can help evaluate the performance, stability, and predictive value of machine learning models, and assist in diagnosing problems throughout the machine learning workflow.

Recently, much of this workflow has been automated through grid search methods, standardized APIs, and GUI-based applications. In practice, however, human intuition and guidance can more effectively hone in on quality models than exhaustive search. By visualizing the model selection process, data scientists can steer towards final, explainable models and avoid pitfalls and traps.

The Yellowbrick library is a diagnostic visualization platform for machine learning that allows data scientists to steer the model selection process. Yellowbrick extends the Scikit-Learn API with a new core object: the Visualizer. Visualizers allow visual models to be fit and transformed as part of the Scikit-Learn Pipeline process, providing visual diagnostics throughout the transformation of high dimensional data.

The Model Selection Triple
--------------------------
Expand Down Expand Up @@ -49,3 +56,21 @@ We think that's a pretty fair deal, and we're big believers in open source. If y
.. _`@rebeccabilbro`: https://github.com/rebeccabilbro
.. _`@bbengfort`: https://github.com/bbengfort
.. _`District Data Labs`: http://www.districtdatalabs.com/

Presentations
-------------

Yellowbrick has enjoyed the spotlight at a few conferences and in several presentations. We hope that these videos, talks, and slides will help you understand Yellowbrick a bit better.

Videos:
- `Visual Diagnostics for More Informed Machine Learning: Within and Beyond Scikit-Learn (PyCon 2016) <https://youtu.be/c5DaaGZWQqY>`_
- `Visual Diagnostics for More Informed Machine Learning (PyData Carolinas 2016) <https://youtu.be/cgtNPx7fJUM>`_
- `Yellowbrick: Steering Machine Learning with Visual Transformers (PyData London 2017) <https://youtu.be/2ZKng7pCB5k>`_

Slides:
- `Visualizing the Model Selection Process <https://www.slideshare.net/BenjaminBengfort/visualizing-the-model-selection-process>`_
- `Visualizing Model Selection with Scikit-Yellowbrick <https://www.slideshare.net/BenjaminBengfort/visualizing-model-selection-with-scikityellowbrick-an-introduction-to-developing-visualizers>`_
- `Visual Pipelines for Text Analysis (Data Intelligence 2017) <https://speakerdeck.com/dataintelligence/visual-pipelines-for-text-analysis>`_

.. _QuatroCinco: https://flic.kr/p/2Yj9mj
.. _API: http://scikit-learn.org/stable/modules/classes.html
7 changes: 7 additions & 0 deletions docs/api/anscombe.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
# Creates the anscombe visualization.

import yellowbrick as yb
import matplotlib.pyplot as plt

g = yb.anscombe()
plt.savefig("images/anscombe.png")
24 changes: 24 additions & 0 deletions docs/api/anscombe.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
Anscombe's Quartet
==================

Yellowbrick has learned Anscombe's lesson - which is why we believe that
visual diagnostics are vital to machine learning.

.. code:: python
import yellowbrick as yb
import matplotlib.pyplot as plt
g = yb.anscombe()
plt.show()
.. image:: images/anscombe.png

API Reference
-------------

.. automodule:: yellowbrick.anscombe
:members:
:undoc-members:
:show-inheritance:
30 changes: 30 additions & 0 deletions docs/api/classifier/class_balance.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split

from yellowbrick.classifier import ClassBalance


if __name__ == '__main__':
# Load the regression data set
data = pd.read_csv("../../../examples/data/occupancy/occupancy.csv")

features = ["temperature", "relative humidity", "light", "C02", "humidity"]
classes = ['unoccupied', 'occupied']

# Extract the numpy arrays from the data frame
X = data[features].as_matrix()
y = data.occupancy.as_matrix()

# Create the train and test data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Instantiate the classification model and visualizer
forest = RandomForestClassifier()
visualizer = ClassBalance(forest, classes=classes)

visualizer.fit(X_train, y_train) # Fit the training data to the visualizer
visualizer.score(X_test, y_test) # Evaluate the model on the test data
g = visualizer.poof(outpath="images/class_balance.png") # Draw/show/poof the data
43 changes: 43 additions & 0 deletions docs/api/classifier/class_balance.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
Class Balance
=============

Oftentimes classifiers perform badly because of a class imbalance. A class balance chart can help prepare the user for such a case by showing the support for each class in the fitted
classification model.

.. code:: python
# Load the classification data set
data = load_data('occupancy')
# Specify the features of interest and the classes of the target
features = ["temperature", "relative humidity", "light", "C02", "humidity"]
classes = ['unoccupied', 'occupied']
# Extract the numpy arrays from the data frame
X = data[features].as_matrix()
y = data.occupancy.as_matrix()
# Create the train and test data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
.. code:: python
# Instantiate the classification model and visualizer
forest = RandomForestClassifier()
visualizer = ClassBalance(forest, classes=classes)
visualizer.fit(X_train, y_train) # Fit the training data to the visualizer
visualizer.score(X_test, y_test) # Evaluate the model on the test data
g = visualizer.poof() # Draw/show/poof the data
.. image:: images/class_balance.png


API Reference
-------------

.. automodule:: yellowbrick.classifier.class_balance
:members: ClassBalance
:undoc-members:
:show-inheritance:
30 changes: 30 additions & 0 deletions docs/api/classifier/classification_report.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.naive_bayes import GaussianNB
from sklearn.model_selection import train_test_split

from yellowbrick.classifier import ClassificationReport


if __name__ == '__main__':
# Load the regression data set
data = pd.read_csv("../../../examples/data/occupancy/occupancy.csv")

features = ["temperature", "relative humidity", "light", "C02", "humidity"]
classes = ['unoccupied', 'occupied']

# Extract the numpy arrays from the data frame
X = data[features].as_matrix()
y = data.occupancy.as_matrix()

# Create the train and test data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Instantiate the classification model and visualizer
bayes = GaussianNB()
visualizer = ClassificationReport(bayes, classes=classes)

visualizer.fit(X_train, y_train) # Fit the training data to the visualizer
visualizer.score(X_test, y_test) # Evaluate the model on the test data
g = visualizer.poof(outpath="images/classification_report.png") # Draw/show/poof the data
45 changes: 45 additions & 0 deletions docs/api/classifier/classification_report.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
Classification Report
~~~~~~~~~~~~~~~~~~~~~

The classification report visualizer displays the precision, recall, and
F1 scores for the model. In order to support easier interpretation and problem detection, the report integrates numerical scores with a color-coded
heatmap.

.. code:: python
# Load the classification data set
data = load_data('occupancy')
# Specify the features of interest and the classes of the target
features = ["temperature", "relative humidity", "light", "C02", "humidity"]
classes = ['unoccupied', 'occupied']
# Extract the numpy arrays from the data frame
X = data[features].as_matrix()
y = data.occupancy.as_matrix()
# Create the train and test data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
.. code:: python
# Instantiate the classification model and visualizer
bayes = GaussianNB()
visualizer = ClassificationReport(bayes, classes=classes)
visualizer.fit(X_train, y_train) # Fit the training data to the visualizer
visualizer.score(X_test, y_test) # Evaluate the model on the test data
g = visualizer.poof() # Draw/show/poof the data
.. image:: images/classification_report.png


API Reference
-------------

.. automodule:: yellowbrick.classifier.classification_report
:members: ClassificationReport
:undoc-members:
:show-inheritance:
26 changes: 26 additions & 0 deletions docs/api/classifier/confusion_matrix.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
import pandas as pd
import matplotlib.pyplot as plt

from sklearn.datasets import load_digits
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

from yellowbrick.classifier import ConfusionMatrix


if __name__ == '__main__':
# Load the regression data set
digits = load_digits()
X = digits.data
y = digits.target

X_train, X_test, y_train, y_test = train_test_split(X,y, test_size =0.2, random_state=11)

model = LogisticRegression()

#The ConfusionMatrix visualizer taxes a model
cm = ConfusionMatrix(model, classes=[0,1,2,3,4,5,6,7,8,9])

cm.fit(X_train, y_train) # Fit the training data to the visualizer
cm.score(X_test, y_test) # Evaluate the model on the test data
g = cm.poof(outpath="images/confusion_matrix.png") # Draw/show/poof the data
65 changes: 65 additions & 0 deletions docs/api/classifier/confusion_matrix.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
Confusion Matrix
================

The ``ConfusionMatrix`` visualizer is a ScoreVisualizer that takes a
fitted Scikit-Learn classifier and a set of test X and y values and
returns a report showing how each of the test values predicted classes
compare to their actual classes. Data scientists use confusion matrices
to understand which classes are most easily confused. These provide
similar information as what is available in a ClassificationReport, but
rather than top-level scores they provide deeper insight into the
classification of individual data points.

Below are a few examples of using the ConfusionMatrix visualizer; more
information can be found by looking at the
Scikit-Learn documentation on `confusion matrices <http://scikit-learn.org/stable/modules/generated/sklearn.metrics.confusion_matrix.html>`_.

.. code:: python
#First do our imports
import yellowbrick
from sklearn.datasets import load_digits
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from yellowbrick.classifier import ConfusionMatrix
.. code:: python
# We'll use the handwritten digits data set from scikit-learn.
# Each feature of this dataset is an 8x8 pixel image of a handwritten number.
# Digits.data converts these 64 pixels into a single array of features
digits = load_digits()
X = digits.data
y = digits.target
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size =0.2, random_state=11)
model = LogisticRegression()
#The ConfusionMatrix visualizer taxes a model
cm = ConfusionMatrix(model, classes=[0,1,2,3,4,5,6,7,8,9])
#Fit fits the passed model. This is unnecessary if you pass the visualizer a pre-fitted model
cm.fit(X_train, y_train)
#To create the ConfusionMatrix, we need some test data. Score runs predict() on the data
#and then creates the confusion_matrix from scikit learn.
cm.score(X_test, y_test)
#How did we do?
cm.poof()
.. image:: images/confusion_matrix.png


API Reference
-------------

.. automodule:: yellowbrick.classifier.confusion_matrix
:members: ConfusionMatrix
:undoc-members:
:show-inheritance:
Binary file added docs/api/classifier/images/class_balance.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/api/classifier/images/confusion_matrix.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/api/classifier/images/rocauc.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 81594c1

Please sign in to comment.