Merge branch 'release-1.3.post1'

DistrictDataLabs · Feb 13, 2021 · 53ea6d3 · 53ea6d3
2 parents 682b352 + 722706d
commit 53ea6d3
Show file tree

Hide file tree

Showing 262 changed files with 4,021 additions and 1,822 deletions.
diff --git a/.appveyor.yml b/.appveyor.yml
diff --git a/.travis.yml b/.travis.yml
@@ -2,18 +2,18 @@ dist: xenial
 language: python
 matrix:
   include:
-    - name: "Python 3.6 on Xenial Linux"
-      python: '3.6'
-
     - name: "Python 3.7 on Xenial Linux"
       python: '3.7'
 
-    - name: "Miniconda 3.6 on Xenial Linux"
-      env: ANACONDA="3.6"
+    - name: "Python 3.8 on Xenial Linux"
+      python: '3.8'
 
     - name: "Miniconda 3.7 on Xenial Linux"
       env: ANACONDA="3.7"
 
+    - name: "Miniconda 3.8 on Xenial Linux"
+      env: ANACONDA="3.8"
+
 before_install:
 - sudo apt-get update;
 - if [[ "$TRAVIS_OS_NAME" == "linux" ]]; then
@@ -23,8 +23,8 @@ before_install:
 
 install:
 - if [[ -z ${ANACONDA} ]]; then
-      pip install -r requirements.txt;
       pip install -r tests/requirements.txt;
+      pip install -r requirements.txt;
       pip install coveralls;
     else
       wget https://repo.anaconda.com/miniconda/Miniconda3-latest-$MINICONDA_OS-x86_64.sh -O miniconda.sh;

diff --git a/MANIFEST.in b/MANIFEST.in
@@ -9,6 +9,8 @@ include MANIFEST.in
 include examples/*.ipynb
 include examples/*.md
 
+include LICENSE.txt
+
 graft docs
 prune docs/_build
 
@@ -24,4 +26,4 @@ global-exclude *.py[co]
 global-exclude .ipynb_checkpoints
 global-exclude .DS_Store
 global-exclude .env
-global-exclude .coverage.*
+global-exclude .coverage.*
diff --git a/README.md b/README.md
@@ -66,8 +66,8 @@ from sklearn.svm import LinearSVC
 from yellowbrick.classifier import ROCAUC
 
 model = LinearSVC()
-model.fit(X,y)
 visualizer = ROCAUC(model)
+visualizer.fit(X,y)
 visualizer.score(X,y)
 visualizer.show()
 ```

diff --git a/docs/api/classifier/classification_report.rst b/docs/api/classifier/classification_report.rst
@@ -45,13 +45,13 @@ Workflow             Model evaluation
 
 The classification report shows a representation of the main classification metrics on a per-class basis. This gives a deeper intuition of the classifier behavior over global accuracy which can mask functional weaknesses in one class of a multiclass problem. Visual classification reports are used to compare classification models to select models that are "redder", e.g. have stronger classification metrics or that are more balanced.
 
-The metrics are defined in terms of true and false positives, and true and false negatives. Positive and negative in this case are generic names for the classes of a binary classification problem. In the example above, we would consider true and false occupied and true and false unoccupied. Therefore a true positive is when the actual class is positive as is the estimated class. A false positive is when the actual class is negative but the estimated class is positive. Using this terminology the meterics are defined as follows:
+The metrics are defined in terms of true and false positives, and true and false negatives. Positive and negative in this case are generic names for the classes of a binary classification problem. In the example above, we would consider true and false occupied and true and false unoccupied. Therefore a true positive is when the actual class is positive as is the estimated class. A false positive is when the actual class is negative but the estimated class is positive. Using this terminology the metrics are defined as follows:
 
 **precision**
-    Precision is the ability of a classiifer not to label an instance positive that is actually negative. For each class it is defined as as the ratio of true positives to the sum of true and false positives. Said another way, "for all instances classified positive, what percent was correct?"
+    Precision can be seen as a measure of a classifier's exactness. For each class, it is defined as the ratio of true positives to the sum of true and false positives. Said another way, "for all instances classified positive, what percent was correct?"
 
 **recall**
-    Recall is the ability of a classifier to find all positive instances. For each class it is defined as the ratio of true positives to the sum of true positives and false negatives. Said another way, "for all instances that were actually positive, what percent was classified correctly?"
+    Recall is a measure of the classifier's completeness; the ability of a classifier to correctly find all positive instances. For each class, it is defined as the ratio of true positives to the sum of true positives and false negatives. Said another way, "for all instances that were actually positive, what percent was classified correctly?"
 
 **f1 score**
     The F\ :sub:`1` score is a weighted harmonic mean of precision and recall such that the best score is 1.0 and the worst is 0.0. Generally speaking, F\ :sub:`1` scores are lower than accuracy measures as they embed precision and recall into their computation. As a rule of thumb, the weighted average of F\ :sub:`1` should be used to compare classifier models, not global accuracy.
@@ -74,13 +74,13 @@ show it.
 
     from sklearn.model_selection import TimeSeriesSplit
     from sklearn.naive_bayes import GaussianNB
-    
+
     from yellowbrick.datasets import load_occupancy
     from yellowbrick.classifier import classification_report
 
     # Load the classification data set
     X, y = load_occupancy()
-    
+
     # Specify the target classes
     classes = ["unoccupied", "occupied"]
 

diff --git a/docs/api/classifier/confusion_matrix.rst b/docs/api/classifier/confusion_matrix.rst
@@ -68,8 +68,9 @@ Class names can be added to a ``ConfusionMatrix`` plot using the ``label_encoder
     :alt: ConfusionMatrix plot with class names
 
     from sklearn.datasets import load_iris
-    from sklearn.model_selection import train_test_split as tts
     from sklearn.linear_model import LogisticRegression
+    from sklearn.model_selection import train_test_split as tts
+
     from yellowbrick.classifier import ConfusionMatrix
 
     iris = load_iris()
@@ -88,7 +89,6 @@ Class names can be added to a ``ConfusionMatrix`` plot using the ``label_encoder
 
     iris_cm.fit(X_train, y_train)
     iris_cm.score(X_test, y_test)
-
     iris_cm.show()
 
 Quick Method

diff --git a/docs/api/classifier/prcurve.rst b/docs/api/classifier/prcurve.rst
@@ -3,13 +3,7 @@
 Precision-Recall Curves
 =======================
 
-Precision-Recall curves are a metric used to evaluate a classifier's quality,
-particularly when classes are very imbalanced. The precision-recall curve
-shows the tradeoff between precision, a measure of result relevancy, and
-recall, a measure of how many relevant results are returned. A large area
-under the curve represents both high recall and precision, the best case
-scenario for a classifier, showing a model that returns accurate results
-for the majority of classes it selects.
+The ``PrecisionRecallCurve`` shows the tradeoff between a classifier's precision, a measure of result relevancy, and recall, a measure of completeness. For each class, precision is defined as the ratio of true positives to the sum of true and false positives, and recall is the ratio of true positives to the sum of true positives and false negatives.
 
 =================   ==============================
 Visualizer           :class:`~yellowbrick.classifier.prcurve.PrecisionRecallCurve`
@@ -18,37 +12,115 @@ Models               Classification
 Workflow             Model evaluation
 =================   ==============================
 
+**precision**
+    Precision can be seen as a measure of a classifier's exactness. For each class, it is defined as the ratio of true positives to the sum of true and false positives. Said another way, "for all instances classified positive, what percent was correct?"
+
+**recall**
+    Recall is a measure of the classifier's completeness; the ability of a classifier to correctly find all positive instances. For each class, it is defined as the ratio of true positives to the sum of true positives and false negatives. Said another way, "for all instances that were actually positive, what percent was classified correctly?"
+
+**average precision**
+    Average precision expresses the precision-recall curve in a single number, which
+    represents the area under the curve. It is computed as the weighted average of precision achieved at each threshold, where the weights are the differences in recall from the previous thresholds.
+
+Both precision and recall vary between 0 and 1, and in our efforts to select and tune machine learning models, our goal is often to try to maximize both precision and recall, i.e. a model that returns accurate results for the majority of classes it selects. This would result in a ``PrecisionRecallCurve`` visualization with a high area under the curve.
 
 Binary Classification
 ---------------------
 
+The base case for precision-recall curves is the binary classification case, and this case is also the most visually interpretable. In the figure below we can see the precision plotted on the y-axis against the recall on the x-axis. The larger the filled in area, the stronger the classifier. The red line annotates the average precision.
+
 .. plot::
     :context: close-figs
     :alt: PrecisionRecallCurve with Binary Classification
 
+    import matplotlib.pyplot as plt
+
+    from yellowbrick.datasets import load_spam
     from sklearn.linear_model import RidgeClassifier
-    from sklearn.model_selection import train_test_split as tts
     from yellowbrick.classifier import PrecisionRecallCurve
-    from yellowbrick.datasets import load_spam
+    from sklearn.model_selection import train_test_split as tts
 
     # Load the dataset and split into train/test splits
     X, y = load_spam()
 
-    X_train, X_test, y_train, y_test = tts(X, y, test_size=0.2, shuffle=True)
+    X_train, X_test, y_train, y_test = tts(
+        X, y, test_size=0.2, shuffle=True, random_state=0
+    )
 
     # Create the visualizer, fit, score, and show it
-    viz = PrecisionRecallCurve(RidgeClassifier())
+    viz = PrecisionRecallCurve(RidgeClassifier(random_state=0))
     viz.fit(X_train, y_train)
     viz.score(X_test, y_test)
     viz.show()
 
+One way to use ``PrecisionRecallCurves`` is for model comparison, by examining which have the highest average precision. For instance, the below visualization suggest that a ``LogisticRegression`` model might be better than a ``RidgeClassifier`` for this particular dataset:
+
+.. plot::
+    :context: close-figs
+    :include-source: False
+    :alt: Comparing PrecisionRecallCurves with Binary Classification
+
+    import matplotlib.pyplot as plt
+
+    from yellowbrick.datasets import load_spam
+    from yellowbrick.classifier import PrecisionRecallCurve
+    from sklearn.model_selection import train_test_split as tts
+    from sklearn.linear_model import RidgeClassifier, LogisticRegression
+
+    # Load the dataset and split into train/test splits
+    X, y = load_spam()
+
+    X_train, X_test, y_train, y_test = tts(
+        X, y, test_size=0.2, shuffle=True, random_state=0
+    )
+
+    # Create the visualizers, fit, score, and show them
+    models = [
+        RidgeClassifier(random_state=0), LogisticRegression(random_state=0)
+    ]
+    _, axes = plt.subplots(ncols=2, figsize=(8,4))
 
-The base case for precision-recall curves is the binary classification case, and this case is also the most visually interpretable. In the figure above we can see the precision plotted on the y-axis against the recall on the x-axis. The larger the filled in area, the stronger the classifier is. The red line annotates the *average precision*, a summary of the entire plot computed as the weighted average of precision achieved at each threshold such that the weight is the difference in recall from the previous threshold.
+    for idx, ax in enumerate(axes.flatten()):
+        viz = PrecisionRecallCurve(models[idx], ax=ax, show=False)
+        viz.fit(X_train, y_train)
+        viz.score(X_test, y_test)
+        viz.finalize()
+
+    plt.show()
+
+Precision-recall curves are one of the methods used to evaluate a classifier's quality, particularly when classes are very imbalanced. The below plot suggests that our classifier improves when we increase the weight of the "spam" case (which is 1), and decrease the weight for the "not spam" case (which is 0).
+
+.. plot::
+    :context: close-figs
+    :alt: Optimizing PrecisionRecallCurve with Binary Classification
+
+    from yellowbrick.datasets import load_spam
+    from sklearn.linear_model import LogisticRegression
+    from yellowbrick.classifier import PrecisionRecallCurve
+    from sklearn.model_selection import train_test_split as tts
+
+    # Load the dataset and split into train/test splits
+    X, y = load_spam()
+
+    X_train, X_test, y_train, y_test = tts(
+        X, y, test_size=0.2, shuffle=True, random_state=0
+    )
+
+    # Specify class weights to shift the threshold towards spam classification
+    weights = {0:0.2, 1:0.8}
+
+    # Create the visualizer, fit, score, and show it
+    viz = PrecisionRecallCurve(
+        LogisticRegression(class_weight=weights, random_state=0)
+    )
+    viz.fit(X_train, y_train)
+    viz.score(X_test, y_test)
+    viz.show()
 
 Multi-Label Classification
 --------------------------
 
-To support multi-label classification, the estimator is wrapped in a `OneVsRestClassifier <http://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsRestClassifier.html>`_ to produce binary comparisons for each class (e.g. the positive case is the class and the negative case is any other class). The Precision-Recall curve is then computed as the micro-average of the precision and recall for all classes:
+To support multi-label classification, the estimator is wrapped in a `OneVsRestClassifier <http://scikit-learn.org/stable/modules/generated/sklearn.multiclass.OneVsRestClassifier.html>`_ to produce binary comparisons for each class (e.g. the positive case is the class and the negative case is any other class). The precision-recall curve can then be computed as the micro-average of the precision and recall for all classes (by setting ``micro=True``), or individual curves can be plotted for each class (by setting ``per_class=True``):
 
 .. plot::
     :context: close-figs
@@ -68,7 +140,11 @@ To support multi-label classification, the estimator is wrapped in a `OneVsRestC
     X_train, X_test, y_train, y_test = tts(X, y, test_size=0.2, shuffle=True)
 
     # Create the visualizer, fit, score, and show it
-    viz = PrecisionRecallCurve(RandomForestClassifier(n_estimators=10))
+    viz = PrecisionRecallCurve(
+        RandomForestClassifier(n_estimators=10),
+        per_class=True,
+        cmap="Set1"
+    )
     viz.fit(X_train, y_train)
     viz.score(X_test, y_test)
     viz.show()
@@ -89,15 +165,21 @@ A more complex Precision-Recall curve can be computed, however, displaying the e
     # Load dataset and encode categorical variables
     X, y = load_game()
     X = OrdinalEncoder().fit_transform(X)
+
+    # Encode the target (we'll use the encoder to retrieve the class labels)
     encoder = LabelEncoder()
     y = encoder.fit_transform(y)
 
     X_train, X_test, y_train, y_test = tts(X, y, test_size=0.2, shuffle=True)
 
     # Create the visualizer, fit, score, and show it
     viz = PrecisionRecallCurve(
-        MultinomialNB(), per_class=True, iso_f1_curves=True,
-        fill_area=False, micro=False, classes=encoder.classes_
+        MultinomialNB(),
+        classes=encoder.classes_,
+        colors=["purple", "cyan", "blue"],
+        iso_f1_curves=True,
+        per_class=True,
+        micro=False
     )
     viz.fit(X_train, y_train)
     viz.score(X_test, y_test)

diff --git a/docs/api/classifier/rocauc.rst b/docs/api/classifier/rocauc.rst
@@ -52,8 +52,7 @@ Workflow             Model evaluation
 Multi-class ROCAUC Curves
 -------------------------
 
-Yellowbrick's ``ROCAUC`` Visualizer does allow for plotting multiclass classification curves.
-ROC curves are typically used in binary classification, and in fact the Scikit-Learn ``roc_curve`` metric is only able to perform metrics for binary classifiers. Yellowbrick addresses this by binarizing the output (per-class) or to use one-vs-rest (micro score) or one-vs-all (macro score) strategies of classification.
+Yellowbrick's ``ROCAUC`` Visualizer does allow for plotting multiclass classification curves. ROC curves are typically used in binary classification, and in fact the Scikit-Learn ``roc_curve`` metric is only able to perform metrics for binary classifiers. Yellowbrick addresses this by binarizing the output (per-class) or to use one-vs-rest (micro score) or one-vs-all (macro score) strategies of classification.
 
 .. plot::
     :context: close-figs