Fixed an error of data_drop_duplicates_index with remove duplicate …

…indexes to retain only the highest expressed genes #45
Starlitnightly · Jul 5, 2024 · 5899f85 · 5899f85
1 parent d39da81
commit 5899f85
Show file tree

Hide file tree

Showing 2 changed files with 6 additions and 2 deletions.
diff --git a/omicverse/bulk/_Deseq2.py b/omicverse/bulk/_Deseq2.py
@@ -156,8 +156,11 @@ def data_drop_duplicates_index(data:pd.DataFrame)->pd.DataFrame:
     Returns:
         data: The data after dropping the duplicated index.
     """
-    index=data.index
-    data=data.loc[~index.duplicated(keep='first')]
+    # Sort the data by the sum of counts in descending order
+    data = data.loc[data.sum(axis=1).sort_values(ascending=False).index]
+
+    # Drop duplicates, keeping the first occurrence (which is the highest due to sorting)
+    data = data.loc[~data.index.duplicated(keep='first')]
     return data
 
 class pyDEG(object):

diff --git a/omicverse_guide/docs/Release_notes.md b/omicverse_guide/docs/Release_notes.md
@@ -403,3 +403,4 @@ Support Raw Windows platform
 - Optimised pyGSEA's `geneset_plot` visualisation of coordinate effects
 - Fixed an error of `pyTCGA.survival_analysis` when the matrix is sparse. #62, #68, #95
 - Added tqdm to visualize the process of `pyTCGA.survial_analysis_all`
+- Fixed an error of `data_drop_duplicates_index` with remove duplicate indexes to retain only the highest expressed genes #45