You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running the DiscrimTwoSample.test function, I am getting different results depending on the order of my x1 and x2 input. Please see code and output below.
After looking into the code, I think the problem probably lies with the removal of isolates in the data. It seems that when providing two matrices, the isolates are only removed from the first matrix. When removing the isolates before passing the matrices to the DiscrimTwoSample.test function, I receive the expected results.
Reproducing code example:
import pandas as pd
import numpy as np
import re
from hyppo.discrim import DiscrimTwoSample
def get_discrim_two_sample(dice_1: str, dice_2: str):
dices_1 = pd.read_csv(dice_1, index_col=0, na_values=[""]).values
distances_1 = 1 - dices_1
subject_ids_1 = pd.read_csv(dice_1, nrows=0).columns.values[1:]
subject_ids_1 = np.array(
[re.sub(r"sub-", "", col) for col in subject_ids_1]
)
dices_2 = pd.read_csv(dice_2, index_col=0, na_values=[""]).values
distances_2 = 1 - dices_2
subject_ids_2 = pd.read_csv(dice_2, nrows=0).columns.values[1:]
subject_ids_2 = np.array(
[re.sub(r"sub-", "", col) for col in subject_ids_2]
)
# Remove rows and columns from the distance matrix that only contain NaNs
# This will exclude runs that the bundle of interest couldn't be reconstructed for
rows_to_keep_1 = ~np.isnan(distances_1).all(axis=1)
distances_1 = distances_1[rows_to_keep_1]
distances_1 = distances_1[:, rows_to_keep_1]
subject_ids_1 = subject_ids_1[rows_to_keep_1]
rows_to_keep_2 = ~np.isnan(distances_2).all(axis=1)
distances_2 = distances_2[rows_to_keep_2]
distances_2 = distances_2[:, rows_to_keep_2]
subject_ids_2 = subject_ids_2[rows_to_keep_2]
# Remove all subID-run combos that are not present in both distance matrices
# Step 1: Find and sort the common subject-run combinations for stable indexing
common_subids = np.intersect1d(subject_ids_1, subject_ids_2)
common_subids.sort() # Sort to ensure consistency in ordering
# Step 2: Find indices of common combinations in both vectors using sorted common_subids
indices1 = np.searchsorted(subject_ids_1, common_subids)
indices2 = np.searchsorted(subject_ids_2, common_subids)
# Step 3: Filter matrices accordingly, ensuring consistent ordering
filtered_distances_1 = distances_1[np.ix_(indices1, indices1)]
filtered_distances_2 = distances_2[np.ix_(indices2, indices2)]
# remove run from the subject IDS so that they can be converted to float
common_subids = np.array(
[re.sub(r"\_run-\d+", "", col) for col in common_subids]
)
two_sample_output = DiscrimTwoSample(is_dist=True, remove_isolates=True).test(filtered_distances_1, filtered_distances_2, common_subids, workers=1)
print(two_sample_output)
two_sample_output = DiscrimTwoSample(is_dist=True, remove_isolates=True).test(filtered_distances_2, filtered_distances_1, common_subids, workers=1)
print(two_sample_output)
return two_sample_output
dice1 = '/Users/amelier/Code/dice_GQI/ProjectionBrainstemDentatorubrothalamicTractlr.csv'
dice2 = "/Users/amelier/Code/dice_SS3T/ProjectionBrainstemDentatorubrothalamicTractlr.csv"
get_discrim_two_sample(dice1, dice2)
When running the
DiscrimTwoSample.test
function, I am getting different results depending on the order of my x1 and x2 input. Please see code and output below.After looking into the code, I think the problem probably lies with the removal of isolates in the data. It seems that when providing two matrices, the isolates are only removed from the first matrix. When removing the isolates before passing the matrices to the
DiscrimTwoSample.test
function, I receive the expected results.Reproducing code example:
Output:
Version information
The text was updated successfully, but these errors were encountered: