Skip to content

Commit

Permalink
Fixed an error of data_drop_duplicates_index with remove duplicate …
Browse files Browse the repository at this point in the history
…indexes to retain only the highest expressed genes #45
  • Loading branch information
Starlitnightly committed Jul 5, 2024
1 parent d39da81 commit 5899f85
Show file tree
Hide file tree
Showing 2 changed files with 6 additions and 2 deletions.
7 changes: 5 additions & 2 deletions omicverse/bulk/_Deseq2.py
Original file line number Diff line number Diff line change
Expand Up @@ -156,8 +156,11 @@ def data_drop_duplicates_index(data:pd.DataFrame)->pd.DataFrame:
Returns:
data: The data after dropping the duplicated index.
"""
index=data.index
data=data.loc[~index.duplicated(keep='first')]
# Sort the data by the sum of counts in descending order
data = data.loc[data.sum(axis=1).sort_values(ascending=False).index]

# Drop duplicates, keeping the first occurrence (which is the highest due to sorting)
data = data.loc[~data.index.duplicated(keep='first')]
return data

class pyDEG(object):
Expand Down
1 change: 1 addition & 0 deletions omicverse_guide/docs/Release_notes.md
Original file line number Diff line number Diff line change
Expand Up @@ -403,3 +403,4 @@ Support Raw Windows platform
- Optimised pyGSEA's `geneset_plot` visualisation of coordinate effects
- Fixed an error of `pyTCGA.survival_analysis` when the matrix is sparse. #62, #68, #95
- Added tqdm to visualize the process of `pyTCGA.survial_analysis_all`
- Fixed an error of `data_drop_duplicates_index` with remove duplicate indexes to retain only the highest expressed genes #45

0 comments on commit 5899f85

Please sign in to comment.