Skip to content

Commit

Permalink
Merge branch 'release-1.0.1'
Browse files Browse the repository at this point in the history
  • Loading branch information
bbengfort committed Oct 6, 2019
2 parents d9e1218 + 753e4ba commit dd795b4
Show file tree
Hide file tree
Showing 179 changed files with 1,685 additions and 14,009 deletions.
4 changes: 2 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -42,12 +42,12 @@ install:
conda install coveralls;
fi

script:
script:
- python -m nltk.downloader popular
- make test

after_success: coveralls

notifications:
slack:
secure: mWKVHmEc22FJSp6Rrnd1j4QYCgZY4NJSrA8kZ5wj2/lf1iHI/CfWGTf7+Qihqe+rt0FOU0+UA9SzvSHRD1bV76q/zINayQ0EyJAfQzvIWIRGGnnMSO/79WoEYF56wwjpc5pLUTh6QV5qqfy+8nNGQ1/uJ0h6FtsUaSa/g61a5ZJEVBIjIpH8PgMxM64dRgJCmAdQuXkBP5Uf3yHlCtYk+Jr+gyXU2oqwMZ1VWgZkEo1Tqo7W9WY8dkOaAkzXDT61OqtcyyTuVSYbmK4i3c84681NBpb7wT6BfiCCAd3tn5AIKCkJVJ0ga0XeF6MdDpnicpku4FaN+fQjwkPiU47o/aFp8RNp27JQ9AhvH7wMuu5O8HDhszjRkfGOlUbuPOTavc22o4j0ShsrLiTQRJRhQQzJoquPuPj5wHqCCN+ice7IVUHj3ZC2jpJKDEYUNnr1fATtOwocimc6PhJM/IoeHgEEHpi37b+AxnhgOFoBlgsq2f4nsRD9JsLHqIpJCHgMjKxc6p3FtcFcXZDlDXQIcCzSRiPhG207dahspA3aPLj4Z+tOLJwh7/PSEfp02kcgPMM/MLYTWcaBv14aYi69kvQoZTfqVY8tIohg3ygda5siOCTTgqGriJYzkmdY5/Dp51kabhl+cEVIxPyY0miqyl3hZjqkqCnnOtg06qqxLLM=
secure: YvJ/aF5Ev2wgqoSc+QG4LA8XCovdfW7w7FiOMiRA6zrLjywEC12KzVDBTotIRFJVncCmh/WuyTCJUYfYA1Q0MrySpAF8cDr4fdGnO3skopU9Nx7pVuXOrHQ2LcVTEE0sGAeYH+hGrT+7TsbGR9iwki5xkkT0g1QEgJqvLhph6Y6gQMAtPceXU7wnIJf9Fn4IdTrDbeAawxhYsuVLTptGSS9UHYsV0P3lwPg1FItduE1UzNhyicBXzj/8f56/xBxNeYEGwFMhE1oad3lm9BRLzpqGwsIHWR5JLIYcX+y1YceFvB+vz4Xsf6H+XaCCb7uzBfC2BAc9+gr0zjUbiLcTyA1LyuR9kOlFCUx/nSGkJyhXcMb+NbA0vK9JY7ss2kempoxCDCkzpjFNasqGJMyPagI3na8YRu1RTTmBJUip9U+oN80Kr4lSMzbLDCDA2LTQBeL3zSSW51foiQPIDowK/CYQSMo/0IVp2x9ronWhDBbszHkXoWCv6/AMzjGhASDDg4AJD40zLo/pcEevcJdTraO915Sp8PtltbLnuuklJSi1xci5O6ja/ldyC7lKPm77z9nlx805349dLTkNpD27xXpALWPUJBNNrVpD3H6SvYB3b2IVgVjENdHZGLcCjlbwgdZ30zPik4Sj/w+8GoGxh5l/V6wHUhwOMm7ZKr7lcXk=
10 changes: 5 additions & 5 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -165,11 +165,11 @@ These two basic types of visualizers map well to the two basic estimator objects
The scikit-learn API is object oriented, and estimators are initialized with parameters by instantiating their class. Hyperparameters can also be set using the `set_attrs()` method and retrieved with the corresponding `get_attrs()` method. All scikit-learn estimators have a `fit(X, y=None)` method that accepts a two dimensional data array, `X`, and optionally a vector `y` of target values. The `fit()` method trains the estimator, making it ready to transform data or make predictions. Transformers have an associated `transform(X)` method that returns a new dataset, `Xprime` and models have a `predict(X)` method that returns a vector of predictions, `yhat`. Models may also have a `score(X, y)` method that evaluate the performance of the model.
Visualizers interact with scikit-learn objects by intersecting with them at the methods defined above. Specifically, visualizers perform actions related to `fit()`, `transform()`, `predict()`, and `score()` then call a `draw()` method which initializes the underlying figure associated with the visualizer. The user calls the visualizer's `poof()` method, which in turn calls a `finalize()` method on the visualizer to draw legends, titles, etc. and then `poof()` renders the figure. The Visualizer API is therefore:
Visualizers interact with scikit-learn objects by intersecting with them at the methods defined above. Specifically, visualizers perform actions related to `fit()`, `transform()`, `predict()`, and `score()` then call a `draw()` method which initializes the underlying figure associated with the visualizer. The user calls the visualizer's `show()` method, which in turn calls a `finalize()` method on the visualizer to draw legends, titles, etc. and then `show()` renders the figure. The Visualizer API is therefore:
- `draw()`: add visual elements to the underlying axes object
- `finalize()`: prepare the figure for rendering, adding final touches such as legends, titles, axis labels, etc.
- `poof()`: render the figure for the user.
- `show()`: render the figure for the user.
Creating a visualizer means defining a class that extends `Visualizer` or one of its subclasses, then implementing several of the methods described above. A barebones implementation is as follows::
Expand Down Expand Up @@ -201,7 +201,7 @@ This simple visualizer simply draws a line graph for some input dataset X, inter
```python
visualizer = MyVisualizer()
visualizer.fit(X)
visualizer.poof()
visualizer.show()
```

Score visualizers work on the same principle but accept an additional required `model` argument. Score visualizers wrap the model (which can be either instantiated or uninstantiated) and then pass through all attributes and methods through to the underlying model, drawing where necessary.
Expand Down Expand Up @@ -231,7 +231,7 @@ class MyVisualizerTests(VisualTestCase):
try:
visualizer = MyVisualizer()
visualizer.fit(X)
visualizer.poof()
visualizer.show()
except Exception as e:
pytest.fail("my visualizer didn't work")
```
Expand Down Expand Up @@ -287,7 +287,7 @@ class MyVisualizer(Visualizer):
>>> model = MyVisualizer()
>>> model.fit(X)
>>> model.poof()
>>> model.show()
Notes
-----
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,7 @@ visualizer = Rank2D(
)
visualizer.fit(X, y) # Fit the data to the visualizer
visualizer.transform(X) # Transform the data
visualizer.poof() # Show the data
visualizer.show() # Finalize and render the figure
```

### Model Visualization
Expand All @@ -69,7 +69,7 @@ model = LinearSVC()
model.fit(X,y)
visualizer = ROCAUC(model)
visualizer.score(X,y)
visualizer.poof()
visualizer.show()
```

For additional information on getting started with Yellowbrick, view the quickstart guide in the [documentation](https://www.scikit-yb.org/en/latest/) and check out our [examples notebook](https://github.com/DistrictDataLabs/yellowbrick/blob/develop/examples/examples.ipynb).
Expand Down
12 changes: 6 additions & 6 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

*Welcome to the Yellowbrick docs!*

If you're looking for information about how to use Yellowbrick, for our contributor's guide, for examples and teaching resources, for answers to frequently asked questions, and more, please visit the latest version of our documentation at [www.scikit-yb.org](https://www.scikit-yb.org/).
If you're looking for information about how to use Yellowbrick, for our contributor's guide, for examples and teaching resources, for answers to frequently asked questions, and more, please visit the latest version of our documentation at [www.scikit-yb.org](https://www.scikit-yb.org/).

## Building the Docs

Expand All @@ -16,9 +16,9 @@ You will then be able to build the documentation from inside the `docs` director

## reStructuredText

Yellowbrick uses [Sphinx](http://www.sphinx-doc.org/en/master/index.html) to build our documentation. The advantages of using Sphinx are many; we can more directly link to the documentation and source code of other projects like Matplotlib and scikit-learn using [intersphinx](http://www.sphinx-doc.org/en/master/usage/extensions/intersphinx.html). In addition, docstrings used to describe Yellowbrick visualizers can be automatically included when the documentation is built via [autodoc](http://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html#sphinx.ext.autodoc).
Yellowbrick uses [Sphinx](http://www.sphinx-doc.org/en/master/index.html) to build our documentation. The advantages of using Sphinx are many; we can more directly link to the documentation and source code of other projects like Matplotlib and scikit-learn using [intersphinx](http://www.sphinx-doc.org/en/master/usage/extensions/intersphinx.html). In addition, docstrings used to describe Yellowbrick visualizers can be automatically included when the documentation is built via [autodoc](http://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html#sphinx.ext.autodoc).

To take advantage of these features, our documentation must be written in reStructuredText (or "rst"). reStructuredText is similar to markdown, but not identical, and does take some getting used to. For instance, styling for things like codeblocks, external hyperlinks, internal cross references, notes, and fixed-width text are all unique in rst.
To take advantage of these features, our documentation must be written in reStructuredText (or "rst"). reStructuredText is similar to markdown, but not identical, and does take some getting used to. For instance, styling for things like codeblocks, external hyperlinks, internal cross references, notes, and fixed-width text are all unique in rst.

If you would like to contribute to our documentation and do not have prior experience with rst, we recommend you make use of these resources:

Expand All @@ -28,7 +28,7 @@ If you would like to contribute to our documentation and do not have prior exper

## Adding New Visualizers to the Docs

If you are adding a new visualizer to the docs, there are quite a few examples in the documentation on which you can base your files of similar types.
If you are adding a new visualizer to the docs, there are quite a few examples in the documentation on which you can base your files of similar types.

The primary format for the API section is as follows:

Expand All @@ -48,7 +48,7 @@ A brief introduction to my visualizer and how it is useful in the machine learni
visualizer = MyVisualizer(LinearRegression())
visualizer.fit(X, y)
g = visualizer.poof()
g = visualizer.show()
Discussion about my visualizer and some interpretation of the above plot.
Expand All @@ -62,7 +62,7 @@ API Reference
:show-inheritance:
```

This is a pretty good structure for a documentation page; a brief introduction followed by a code example with a visualization included using [the plot directive](https://matplotlib.org/devel/plot_directive.html). This will render the `MyVisualizer` image in the document along with links for the complete source code, the png, and the pdf versions of the image. It will also have the "alt-text" (for screen-readers) and will not display the source because of the `:include-source:` option. If `:include-source:` is omitted, the source will also be included.
This is a pretty good structure for a documentation page; a brief introduction followed by a code example with a visualization included using [the plot directive](https://matplotlib.org/devel/plot_directive.html). This will render the `MyVisualizer` image in the document along with links for the complete source code, the png, and the pdf versions of the image. It will also have the "alt-text" (for screen-readers) and will not display the source because of the `:include-source:` option. If `:include-source:` is omitted, the source will also be included.

The primary section is wrapped up with a discussion about how to interpret the visualizer and use it in practice. Finally the `API Reference` section will use `automodule` to include the documentation from your docstring.

Expand Down
12 changes: 6 additions & 6 deletions docs/api/classifier/class_prediction_error.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ The class prediction error chart provides a way to quickly understand how good y
visualizer.score(X_test, y_test)

# Draw visualization
visualizer.poof()
visualizer.show()

In the above example, while the ``RandomForestClassifier`` appears to be fairly good at correctly predicting apples based on the features of the fruit, it often incorrectly labels pears as kiwis and mistakes kiwis for bananas.

Expand All @@ -56,13 +56,13 @@ By contrast, in the following example, the ``RandomForestClassifier`` does a gre
from yellowbrick.datasets import load_credit

X, y = load_credit()

classes = ['account in default', 'current with bills']

# Perform 80/20 training/test split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20,
random_state=42)

# Instantiate the classification model and visualizer
visualizer = ClassPredictionError(
RandomForestClassifier(n_estimators=10), classes=classes
Expand All @@ -75,10 +75,10 @@ By contrast, in the following example, the ``RandomForestClassifier`` does a gre
visualizer.score(X_test, y_test)

# Draw visualization
visualizer.poof()
visualizer.show()





API Reference
-------------

Expand Down
2 changes: 1 addition & 1 deletion docs/api/classifier/classification_report.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ The classification report visualizer displays the precision, recall, F1, and sup

visualizer.fit(X_train, y_train) # Fit the visualizer and the model
visualizer.score(X_test, y_test) # Evaluate the model on the test data
visualizer.poof() # Draw/show/poof the data
visualizer.show() # Finalize and show the figure


The classification report shows a representation of the main classification metrics on a per-class basis. This gives a deeper intuition of the classifier behavior over global accuracy which can mask functional weaknesses in one class of a multiclass problem. Visual classification reports are used to compare classification models to select models that are "redder", e.g. have stronger classification metrics or that are more balanced.
Expand Down
8 changes: 4 additions & 4 deletions docs/api/classifier/confusion_matrix.rst
Original file line number Diff line number Diff line change
Expand Up @@ -47,23 +47,23 @@ scikit-learn documentation on `confusion matrices <http://scikit-learn.org/stabl
cm.score(X_test, y_test)

# How did we do?
cm.poof()
cm.show()


Plotting with Class Names
-------------------------

Class names can be added to a ``ConfusionMatrix`` plot using the ``label_encoder`` argument. The ``label_encoder`` can be a `sklearn.preprocessing.LabelEncoder <http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html>`_ (or anything with an ``inverse_transform`` method that performs the mapping), or a ``dict`` with the encoding-to-string mapping as in the example below:

.. plot::
.. plot::
:context: close-figs
:alt: ConfusionMatrix plot with class names

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split as tts
from sklearn.linear_model import LogisticRegression
from yellowbrick.classifier import ConfusionMatrix

iris = load_iris()
X = iris.data
y = iris.target
Expand All @@ -81,7 +81,7 @@ Class names can be added to a ``ConfusionMatrix`` plot using the ``label_encoder
iris_cm.fit(X_train, y_train)
iris_cm.score(X_test, y_test)

iris_cm.poof()
iris_cm.show()


API Reference
Expand Down
12 changes: 6 additions & 6 deletions docs/api/classifier/prcurve.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,11 +28,11 @@ Binary Classification

X_train, X_test, y_train, y_test = tts(X, y, test_size=0.2, shuffle=True)

# Create the visualizer, fit, score, and poof it
# Create the visualizer, fit, score, and show it
viz = PrecisionRecallCurve(RidgeClassifier())
viz.fit(X_train, y_train)
viz.score(X_test, y_test)
viz.poof()
viz.show()


The base case for precision-recall curves is the binary classification case, and this case is also the most visually interpretable. In the figure above we can see the precision plotted on the y-axis against the recall on the x-axis. The larger the filled in area, the stronger the classifier is. The red line annotates the *average precision*, a summary of the entire plot computed as the weighted average of precision achieved at each threshold such that the weight is the difference in recall from the previous threshold.
Expand All @@ -59,11 +59,11 @@ To support multi-label classification, the estimator is wrapped in a `OneVsRestC

X_train, X_test, y_train, y_test = tts(X, y, test_size=0.2, shuffle=True)

# Create the visualizer, fit, score, and poof it
# Create the visualizer, fit, score, and show it
viz = PrecisionRecallCurve(RandomForestClassifier(n_estimators=10))
viz.fit(X_train, y_train)
viz.score(X_test, y_test)
viz.poof()
viz.show()


A more complex Precision-Recall curve can be computed, however, displaying the each curve individually, along with F1-score ISO curves (e.g. that show the relationship between precision and recall for various F1 scores).
Expand All @@ -86,14 +86,14 @@ A more complex Precision-Recall curve can be computed, however, displaying the e

X_train, X_test, y_train, y_test = tts(X, y, test_size=0.2, shuffle=True)

# Create the visualizer, fit, score, and poof it
# Create the visualizer, fit, score, and show it
viz = PrecisionRecallCurve(
MultinomialNB(), per_class=True, iso_f1_curves=True,
fill_area=False, micro=False, classes=encoder.classes_
)
viz.fit(X_train, y_train)
viz.score(X_test, y_test)
viz.poof()
viz.show()


.. seealso:: `Scikit-Learn: Model Selection with Precision Recall Curves <http://scikit-learn.org/stable/auto_examples/model_selection/plot_precision_recall.html>`_
Expand Down
10 changes: 5 additions & 5 deletions docs/api/classifier/rocauc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,14 +31,14 @@ This leads to another metric, area under the curve (AUC), which is a computation

visualizer.fit(X_train, y_train) # Fit the training data to the visualizer
visualizer.score(X_test, y_test) # Evaluate the model on the test data
visualizer.poof() # Draw/show/poof the data
visualizer.show() # Finalize and show the figure


.. warning::
Versions of Yellowbrick =< v0.8 had a `bug <https://github.com/DistrictDataLabs/yellowbrick/blob/develop/examples/rebeccabilbro/rocauc_bug_research.ipynb>`_
that triggered an ``IndexError`` when attempting binary classification using
Versions of Yellowbrick =< v0.8 had a `bug <https://github.com/DistrictDataLabs/yellowbrick/blob/develop/examples/rebeccabilbro/rocauc_bug_research.ipynb>`_
that triggered an ``IndexError`` when attempting binary classification using
a Scikit-learn-style estimator with only a ``decision_function``. This has been
fixed as of v0.9, where the ``micro``, ``macro``, and ``per-class`` parameters of
fixed as of v0.9, where the ``micro``, ``macro``, and ``per-class`` parameters of
``ROCAUC`` are set to ``False`` for such classifiers.


Expand Down Expand Up @@ -75,7 +75,7 @@ ROC curves are typically used in binary classification, and in fact the Scikit-L

visualizer.fit(X_train, y_train) # Fit the training data to the visualizer
visualizer.score(X_test, y_test) # Evaluate the model on the test data
visualizer.poof() # Draw/show/poof the data
visualizer.show() # Finalize and render the figure

.. warning::
The target ``y`` must be numeric for this figure to work, or update to the latest version of sklearn.
Expand Down
4 changes: 2 additions & 2 deletions docs/api/classifier/threshold.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ A visualization of precision, recall, f1 score, and queue rate with respect to t
visualizer = DiscriminationThreshold(model)

visualizer.fit(X, y) # Fit the data to the visualizer
visualizer.poof() # Draw/show/poof the data
visualizer.show() # Finalize and render the figure

One common use of binary classification algorithms is to use the score or probability they produce to determine cases that require special treatment. For example, a fraud prevention application might use a classification algorithm to determine if a transaction is likely fraudulent and needs to be investigated in detail. In the figure above, we present an example where a binary classifier determines if an email is "spam" (the positive case) or "not spam" (the negative case). Emails that are detected as spam are moved to a hidden folder and eventually deleted.

Expand All @@ -40,7 +40,7 @@ Generally speaking, the threshold is balanced between cases and set to 0.5 or 50

- **Queue Rate**: The "queue" is the spam folder or the inbox of the fraud investigation desk. This metric describes the percentage of instances that must be reviewed. If review has a high cost (e.g. fraud prevention) then this must be minimized with respect to business requirements; if it doesn't (e.g. spam filter), this could be optimized to ensure the inbox stays clean.

In the figure above we see the visualizer tuned to look for the optimal F1 score, which is annotated as a threshold of 0.43. The model is run multiple times over multiple train/test splits in order to account for the variability of the model with respect to the metrics (shown as the fill area around the median curve).
In the figure above we see the visualizer tuned to look for the optimal F1 score, which is annotated as a threshold of 0.43. The model is run multiple times over multiple train/test splits in order to account for the variability of the model with respect to the metrics (shown as the fill area around the median curve).


API Reference
Expand Down
Loading

0 comments on commit dd795b4

Please sign in to comment.