From b072b27d0abfff114af8407b74954441b32002ac Mon Sep 17 00:00:00 2001 From: Rebecca Bilbro Date: Thu, 17 May 2018 17:15:27 -0400 Subject: [PATCH 1/4] starting to work on freqdist tests --- tests/test_text/test_freqdist.py | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/tests/test_text/test_freqdist.py b/tests/test_text/test_freqdist.py index d7ecc84c1..a7364f670 100644 --- a/tests/test_text/test_freqdist.py +++ b/tests/test_text/test_freqdist.py @@ -1,13 +1,14 @@ # tests.test_text.test_freqdist # Tests for the frequency distribution visualization # -# Author: Rebecca Bilbro +# Author: Rebecca Bilbro +# Github: @rebeccabilbro # Created: 2017-03-22 15:27 # -# Copyright (C) 2017 District Data Labs +# Copyright (C) 2018 # For license information, see LICENSE.txt # -# ID: test_freqdist.py [bd9cbb9] rebecca.bilbro@bytecubed.com $ +# ID: test_freqdist.py [bd9cbb9] rbilbro@districtdatalabs.com $ """ Tests for the frequency distribution text visualization From 94ffbd9a6689e76a75e09da2e44e5ee5833afbcf Mon Sep 17 00:00:00 2001 From: Rebecca Bilbro Date: Thu, 17 May 2018 21:42:09 -0400 Subject: [PATCH 2/4] updating readme ahead of 0.7.0 release --- README.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/README.md b/README.md index 006d1e23e..07f062ae1 100644 --- a/README.md +++ b/README.md @@ -28,10 +28,10 @@ Visualizers are estimators (objects that learn from data) whose primary objectiv #### Feature Visualization +- **Rank Features**: single or pairwise ranking of features to detect relationships - **Parallel Coordinates**: horizontal visualization of instances +- **Radial Visualization**: separation of instances around a circular plot - **PCA Projection**: projection of instances based on principal components -- **RadViz**: separation of instances around a circular plot -- **Rank Features**: single or pairwise ranking of features to detect relationships - **Feature Importances**: rank features based on their in-model performance - **Recursive Feature Elimination**: find the best subset of features by importance - **Scatter and Joint Plots**: direct data visualization with feature selection @@ -41,15 +41,15 @@ Visualizers are estimators (objects that learn from data) whose primary objectiv - **Class Balance**: see how the distribution of classes affects the model - **Class Prediction Error**: shows error and support in classification - **Classification Report**: visual representation of precision, recall, and F1 -- **Confusion Matrices**: visual description of class decision making - **ROC/AUC Curves**: receiver operator characteristics and area under the curve +- **Confusion Matrices**: visual description of class decision making - **Discrimination Threshold**: find a threshold that best separates binary classes #### Regression Visualization -- **Alpha Selection**: show how the choice of alpha influences regularization - **Prediction Error Plots**: find model breakdowns along the domain of the target - **Residuals Plot**: show the difference in residuals of training and test data +- **Alpha Selection**: show how the choice of alpha influences regularization #### Clustering Visualization @@ -58,13 +58,13 @@ Visualizers are estimators (objects that learn from data) whose primary objectiv #### Model Selection Visualization -- **Validation Curve**: tune a model with respect to a single hyperparameter -- **Learning Curve**: show if a model might benefit from more data or less complexity +- **Validation Curve**: tune a model with respect to a single hyperparameter +- **Learning Curve**: show if a model might benefit from more data or less complexity #### Text Visualization - **Term Frequency**: visualize the frequency distribution of terms in the corpus -- **TSNE**: use stochastic neighbor embedding to project documents. +- **t-SNE Corpus Visualization**: use stochastic neighbor embedding to project documents. And more! Visualizers are being added all the time, so be sure to check the examples (or even the develop branch) and feel free to contribute your ideas for Visualizers! From fabdfca94f58be56e537a03b0915df6ffa99bb43 Mon Sep 17 00:00:00 2001 From: Rebecca Bilbro Date: Thu, 17 May 2018 21:44:13 -0400 Subject: [PATCH 3/4] adds manifold visualizer to README --- README.md | 1 + 1 file changed, 1 insertion(+) diff --git a/README.md b/README.md index 4dfd060f2..bcc298c41 100644 --- a/README.md +++ b/README.md @@ -33,6 +33,7 @@ Visualizers are estimators (objects that learn from data) whose primary objectiv - **Parallel Coordinates**: horizontal visualization of instances - **Radial Visualization**: separation of instances around a circular plot - **PCA Projection**: projection of instances based on principal components +- **Manifold Visualization**: high dimensional visualization with manifold learning - **Feature Importances**: rank features based on their in-model performance - **Recursive Feature Elimination**: find the best subset of features by importance - **Scatter and Joint Plots**: direct data visualization with feature selection From 39402b2a0046e7ff1e6702a55b4fcc7d4e190716 Mon Sep 17 00:00:00 2001 From: Rebecca Bilbro Date: Thu, 17 May 2018 21:58:05 -0400 Subject: [PATCH 4/4] updates DESCRIPTION.rst to sync with README.md --- DESCRIPTION.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/DESCRIPTION.rst b/DESCRIPTION.rst index 34767b1b2..1daa6f410 100644 --- a/DESCRIPTION.rst +++ b/DESCRIPTION.rst @@ -30,6 +30,7 @@ Feature Visualization - **Parallel Coordinates**: horizontal visualization of instances - **Radial Visualization**: separation of instances around a circular plot - **PCA Projection**: projection of instances based on principal components +- **Manifold Visualization**: high dimensional visualization with manifold learning - **Feature Importances**: rank features based on their in-model performance - **Recursive Feature Elimination**: find the best subset of features by importance - **Scatter and Joint Plots**: direct data visualization with feature selection @@ -67,7 +68,7 @@ Text Visualization ~~~~~~~~~~~~~~~~~~ - **Term Frequency**: visualize the frequency distribution of terms in the corpus -- **TSNE**: use stochastic neighbor embedding to project documents. +- **t-SNE Corpus Visualization**: use stochastic neighbor embedding to project documents. ... and more! Visualizers are being added all the time; be sure to check the examples_ (or even the develop_ branch) and feel free to contribute your ideas for new Visualizers!