Skip to content

dionresearch/classgraphic

Repository files navigation

classgraphic

made-with-python image Dev Binder

Interactive classification diagnostic plots for scikit-learn.

coin sorting machine

We classify things for the purpose of doing something to them. Any classification which does not assist manipulation is worse than useless. - Randolph S. Bourne, "Education and Living", The Century Co (April 1917)

Major features:

Plotly based tables for:

  • class_imbalance_table
  • classification_table
  • confusion_matrix_table
  • describe (dataframe stats)
  • prediction_table
  • table

And the following charts:

  • class_imbalance
  • class_error
  • det
  • feature_importance
  • missing
  • precision_recall
  • roc
  • prediction_histogram
  • threshold

For clustering:

  • Delauney triangulations
  • Voronoi tessalations

Try it

Binder

By trying it on binder, you'll see all the details and interactivity. The quickstart below has static images, but if you run these commands in a jupyter notebook, ipython or IDE you will be able to interact with them.

Quickstart

from classgraphic.essential import *

# loading the data
df = px.data.iris()

# let's see what kind of data we have
describe(df, transpose=True).show()

dataframe describe tale

# any missing?
missing(df)

dataframe describe tale

# features
X = df.drop(columns=["species", "species_id"])

#target
y = df["species"]

# Let's check our classes we will be training on and predicting
class_imbalance_table(y, condition="all")

dataframe describe tale

# train / test split
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.5, random_state=random_state
)

# we want to see total count for each, default for bars is to be stacked, so that works
# we could also pass to class_imbalance barmode="overlay" if we prefer
class_imbalance(y_train, y_test, condition="train,test")

dataframe describe tale

# model
model = LogisticRegression(max_iter=max_iter, random_state=random_state)
model.fit(X_train, y_train)

# predictions
y_score = model.predict_proba(X_test)
y_pred = model.predict(X_test)

confusion_matrix_table(model, y_test, y_pred).show()
classification_table(model, y_test, y_pred)

dataframe describe tale dataframe describe tale

feature_importance(model, y, transpose=True)

dataframe describe tale

This concludes the quickstart. There are many more visualizations and tables to explore.

See the notebooks and docs folders on github and the documentation web site for more information.

Requirements

  • Python 3.8 or later
  • numpy
  • pandas
  • plotly>=5.0
  • scikit-learn
  • nbformat

Install

If you use conda, create an environment named classgraphic, then activate it:

  • in Linux: source activate pilot

  • In Windows: conda activate pilot

If you use another environment management create and activate your environment using the normal steps.

Then execute:

python setup.py install

or for installing in development mode:

python -m pip install -e . --no-build-isolation

or alternatively

python setup.py develop

To install from github instead:

pip install git+https://github.com/dionresearch/classgraphic

See also

  • stemgraphic python package for visualization of data and text
  • Hotelling one and two sample Hotelling T2 tests, T2 and f statistics and univariate and multivariate control charts and anomaly detection