Learning Interpretable Characteristic Kernels via Decision Forests

By: Sambit Panda, Cencheng Shen, and Joshua T. Vogelstein

This repo contains figure replication code for our paper.

Abstract

Decision forests are widely used for classification and regression tasks. A lesser known property of tree-based methods is that one can construct a proximity matrix from the tree(s), and these proximity matrices are induced kernels. While there has been extensive research on the applications and properties of kernels, there is relatively little research on kernels induced by decision forests. We construct Kernel Mean Embedding Random Forests (KMERF), which induce kernels from random trees and/or forests using leaf-node proximity. We introduce the notion of an asymptotically characteristic kernel, and prove that KMERF kernels are asymptotically characteristic for both discrete and continuous data. Because KMERF is data-adaptive, we suspected it would outperform kernels selected a priori on finite sample data. We illustrate that KMERF nearly dominates current state-of-the-art kernel-based tests across a diverse range of high-dimensional two-sample and independence testing settings. Furthermore, our forest-based approach is interpretable, and provides feature importance metrics that readily distinguish important dimensions, unlike other high-dimensional non-parametric testing procedures. Hence, this work demonstrates the decision forest-based kernel can be more powerful and more interpretable than existing methods, flying in the face of conventional wisdom of the trade-off between the two.

Notes

The real data figure in the manuscript was created by modifying this MATLAB script and running our test: https://github.com/neurodata/MGC-paper/blob/master/Code/Experiments/run_realData3.m This has been reproduced in the real_data.ipynb file within this repo.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
figs		figs
independence-power-vs-d		independence-power-vs-d
independence-power-vs-n		independence-power-vs-n
two-sample-power-vs-d		two-sample-power-vs-d
two-sample-power-vs-n		two-sample-power-vs-n
.gitignore		.gitignore
README.md		README.md
independence-power-dimension.py		independence-power-dimension.py
independence-power-sample-size.py		independence-power-sample-size.py
kmerf.py		kmerf.py
plt-figures.ipynb		plt-figures.ipynb
refactor.py		refactor.py
simulations.py		simulations.py
two-sample-power-dimension.py		two-sample-power-dimension.py
two-sample-power-sample-size.py		two-sample-power-sample-size.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning Interpretable Characteristic Kernels via Decision Forests

Abstract

Notes

About

Releases

Packages

Languages

neurodata/kmerf

Folders and files

Latest commit

History

Repository files navigation

Learning Interpretable Characteristic Kernels via Decision Forests

Abstract

Notes

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages