Feature Extraction with KNN
Python implementation of feature extraction with KNN.
And @momijiame updated my implementation. I recommend to use this:
https://github.com/momijiame/gokinjo
pip install gokinjo
The following is R implementation:
http://davpinto.com/fastknn/articles/knn-extraction.html#understanding-the-knn-features
- Python 3.x
- numpy
- scikit-learn
- scipy
git clone [email protected]:upura/knnFeat.git
cd knnFeat
pip install -r requirements.txt
Notebook version can be seen here.
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
x0 = np.random.rand(500) - 0.5
x1 = np.random.rand(500) - 0.5
X = np.array(list(zip(x0, x1)))
y = np.array([1 if i0 * i1 > 0 else 0 for (i0, i1) in list(zip(x0, x1))])
from knnFeat import knnExtract
newX = knnExtract(X, y, k = 1, folds = 5)
Quote from here.
It generates k * c new features, where c is the number of class labels. The new features are computed from the distances between the observations and their k nearest neighbors inside each class, as follows:
- The first test feature contains the distances between each test instance and its nearest neighbor inside the first class.
- The second test feature contains the sums of distances between each test instance and its 2 nearest neighbors inside the first class.
- The third test feature contains the sums of distances between each test instance and its 3 nearest neighbors inside the first class.
- And so on.
This procedure repeats for each class label, generating k * c new features. Then, the new training features are generated using a n-fold CV approach, in order to avoid overfitting.
flake8 .
pytest
pytest -v -m 'success' --cov=.