We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi, I have a large dataset (>100k samples) that contains a lot of duplicates. MSPHATE does not converge during the Calculating partitions... step.
Calculating partitions...
I can't share the dataset in question, but I think I replicated the effect with some randomly generated data. See the following code and output:
import numpy as np from multiscale_phate import compress, diffuse, condense np.random.seed(42) # spoof data data = np.random.uniform(size=(10001, 200)) data = np.vstack([data, data, data, data, data, data, data, data, data, data]) # highly redundant # spoof MSPHATE compress step N, features = data.shape n_pca = 200 partitions = None # Computing compression features n_pca, partitions = compress.get_compression_features( N, features, n_pca, partitions, landmarks=2000 ) # modified to display np.max(cluster_counts) and np.ceil(N / desired_num_clusters) _ = compress.subset_data(data, desired_num_clusters=partitions, n_jobs=8, num_cluster=100, random_state=None)
output:
Calculating partitions... np.max(cluster_counts): 3930 np.ceil(N / desired_num_clusters): 6.0 np.max(cluster_counts): 1120 np.ceil(N / desired_num_clusters): 6.0 np.max(cluster_counts): 70 np.ceil(N / desired_num_clusters): 6.0 np.max(cluster_counts): 10 np.ceil(N / desired_num_clusters): 6.0 np.max(cluster_counts): 10 np.ceil(N / desired_num_clusters): 6.0 np.max(cluster_counts): 10 np.ceil(N / desired_num_clusters): 6.0 np.max(cluster_counts): 10
The output is the same after many iterations.
Note: I am using python 3.8 and installed using pip install git+https://github.com/KrishnaswamyLab/Multiscale_PHATE
pip install git+https://github.com/KrishnaswamyLab/Multiscale_PHATE
The text was updated successfully, but these errors were encountered:
No branches or pull requests
Hi, I have a large dataset (>100k samples) that contains a lot of duplicates.
MSPHATE does not converge during the
Calculating partitions...
step.I can't share the dataset in question, but I think I replicated the effect with some randomly generated data. See the following code and output:
output:
The output is the same after many iterations.
Note: I am using python 3.8 and installed using
pip install git+https://github.com/KrishnaswamyLab/Multiscale_PHATE
The text was updated successfully, but these errors were encountered: