Interactive classification diagnostic plots for scikit-learn.
We classify things for the purpose of doing something to them. Any classification which does not assist manipulation is worse than useless. - Randolph S. Bourne, "Education and Living", The Century Co (April 1917)
Plotly based tables for:
- class_imbalance_table
- classification_table
- confusion_matrix_table
- describe (dataframe stats)
- prediction_table
- table
And the following charts:
- class_imbalance
- class_error
- det
- feature_importance
- missing
- precision_recall
- roc
- prediction_histogram
- threshold
For clustering:
- Delauney triangulations
- Voronoi tessalations
By trying it on binder, you'll see all the details and interactivity. The quickstart below has static images, but if you run these commands in a jupyter notebook, ipython or IDE you will be able to interact with them.
from classgraphic.essential import *
# loading the data
df = px.data.iris()
# let's see what kind of data we have
describe(df, transpose=True).show()
# any missing?
missing(df)
# features
X = df.drop(columns=["species", "species_id"])
#target
y = df["species"]
# Let's check our classes we will be training on and predicting
class_imbalance_table(y, condition="all")
# train / test split
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.5, random_state=random_state
)
# we want to see total count for each, default for bars is to be stacked, so that works
# we could also pass to class_imbalance barmode="overlay" if we prefer
class_imbalance(y_train, y_test, condition="train,test")
# model
model = LogisticRegression(max_iter=max_iter, random_state=random_state)
model.fit(X_train, y_train)
# predictions
y_score = model.predict_proba(X_test)
y_pred = model.predict(X_test)
confusion_matrix_table(model, y_test, y_pred).show()
classification_table(model, y_test, y_pred)
feature_importance(model, y, transpose=True)
This concludes the quickstart. There are many more visualizations and tables to explore.
See the notebooks
and docs
folders on github and the documentation
web site for more information.
- Python 3.8 or later
- numpy
- pandas
- plotly>=5.0
- scikit-learn
- nbformat
If you use conda, create an environment named classgraphic
, then activate it:
-
in Linux:
source activate pilot
-
In Windows:
conda activate pilot
If you use another environment management create and activate your environment using the normal steps.
Then execute:
python setup.py install
or for installing in development mode:
python -m pip install -e . --no-build-isolation
or alternatively
python setup.py develop
To install from github instead:
pip install git+https://github.com/dionresearch/classgraphic
- stemgraphic python package for visualization of data and text
- Hotelling one and two sample Hotelling T2 tests, T2 and f statistics and univariate and multivariate control charts and anomaly detection