Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

subset_data does not converge #8

Open
MattScicluna opened this issue Apr 13, 2022 · 0 comments
Open

subset_data does not converge #8

MattScicluna opened this issue Apr 13, 2022 · 0 comments

Comments

@MattScicluna
Copy link

MattScicluna commented Apr 13, 2022

Hi, I have a large dataset (>100k samples) that contains a lot of duplicates.
MSPHATE does not converge during the Calculating partitions... step.

I can't share the dataset in question, but I think I replicated the effect with some randomly generated data. See the following code and output:

import numpy as np
from multiscale_phate import compress, diffuse, condense

np.random.seed(42)

# spoof data
data = np.random.uniform(size=(10001, 200))
data = np.vstack([data, data, data, data, data, data, data, data, data, data])  # highly redundant

# spoof MSPHATE compress step
N, features = data.shape
n_pca = 200
partitions = None

# Computing compression features
n_pca, partitions = compress.get_compression_features(
    N, features, n_pca, partitions, landmarks=2000
)

# modified to display np.max(cluster_counts) and np.ceil(N / desired_num_clusters)
_ = compress.subset_data(data, desired_num_clusters=partitions, n_jobs=8, num_cluster=100, random_state=None)

output:

Calculating partitions...
np.max(cluster_counts):  3930
np.ceil(N / desired_num_clusters):  6.0
np.max(cluster_counts):  1120
np.ceil(N / desired_num_clusters):  6.0
np.max(cluster_counts):  70
np.ceil(N / desired_num_clusters):  6.0
np.max(cluster_counts):  10
np.ceil(N / desired_num_clusters):  6.0
np.max(cluster_counts):  10
np.ceil(N / desired_num_clusters):  6.0
np.max(cluster_counts):  10
np.ceil(N / desired_num_clusters):  6.0
np.max(cluster_counts):  10

The output is the same after many iterations.

Note: I am using python 3.8 and installed using pip install git+https://github.com/KrishnaswamyLab/Multiscale_PHATE

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant