Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Protein function prediction with GO - Part 3 #64

Draft
wants to merge 29 commits into
base: dev
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
bdba442
script to evaluate go predictions
aditya0by0 Nov 4, 2024
264bd94
Merge branch 'dev' into protein_prediction
aditya0by0 Nov 4, 2024
6c0fce1
add fmax to evaluation script
aditya0by0 Nov 4, 2024
154e827
Merge branch 'dev' into protein_prediction
aditya0by0 Nov 4, 2024
58ae92d
add base code for deep_go data migration
aditya0by0 Nov 5, 2024
78a38de
varry fmax threshold as per paper
aditya0by0 Nov 5, 2024
3a4e007
go_uniprot: add sequence len to docstring
aditya0by0 Nov 5, 2024
227a014
update experiment evidence codes as per DeepGo SE
aditya0by0 Nov 6, 2024
33436e8
Merge branch 'dev' into protein_prediction
aditya0by0 Nov 6, 2024
c6d60cd
consIder `X` as a valid amino acid as per DeepGO-SE
aditya0by0 Nov 6, 2024
ca5461f
deepgo se mirgration : add class to migrate
aditya0by0 Nov 6, 2024
af54954
Merge branch 'dev' into protein_prediction
aditya0by0 Nov 6, 2024
dfb9430
migration: rectify errors
aditya0by0 Nov 7, 2024
085b13b
protein trigram containing tokenS with `X`
aditya0by0 Nov 7, 2024
3e0bae0
protein token unigram contain `X`
aditya0by0 Nov 7, 2024
99b5af1
add migration for deepgo1 - 2018 paper
aditya0by0 Nov 11, 2024
a15d492
deepgo1: create non-exclusive val set as a placeholder
aditya0by0 Nov 12, 2024
e0a8524
deepgo1: further split train set into train and val for
aditya0by0 Nov 13, 2024
093be28
migration script update
aditya0by0 Nov 13, 2024
14db9d6
add classes to use migrated deepgo data
aditya0by0 Nov 13, 2024
8922d4d
deepgo: minor code change
aditya0by0 Nov 13, 2024
796356c
modify prints to display actual file name
aditya0by0 Nov 13, 2024
3c11a69
create sub dir for deego dataset and move rel files
aditya0by0 Nov 17, 2024
2b571c5
update imports as per new deepGO dir
aditya0by0 Nov 17, 2024
f75e30b
update import dir for pretrain test
aditya0by0 Nov 17, 2024
1b8b270
migration fix : truncate seq and save data with labels
aditya0by0 Dec 4, 2024
bcda11c
Delete protein_protein_interactions.py
aditya0by0 Dec 4, 2024
85c47a0
migration: replace invalid amino acid with "X" notation
aditya0by0 Dec 4, 2024
fbb5c58
update deepgo configs
aditya0by0 Dec 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions chebai/preprocessing/bin/protein_token/tokens.txt
Original file line number Diff line number Diff line change
Expand Up @@ -18,3 +18,4 @@ W
E
V
H
X
359 changes: 359 additions & 0 deletions chebai/preprocessing/bin/protein_token_3_gram/tokens.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7998,3 +7998,362 @@ WWC
WCC
WCH
WWM
TAX
AXD
XDR
IEX
EXV
QAX
AXX
XXE
XES
MXN
XNF
NRX
RXX
XXX
XXR
XRI
SAX
AXG
XGG
PRX
RXR
XRX
RXE
XEF
QEX
EXQ
XQR
REX
EXR
RXQ
XQQ
DRX
RXP
XPG
QMX
MXT
XTX
TXR
XRM
APX
PXX
XXG
XGI
NLX
LXX
XXM
XMA
LNX
NXE
XEA
GTX
TXN
XND
LIX
IXI
XIM
MVX
VXX
XXK
XKT
GLX
LXP
XPP
QGX
GXD
XDL
XAP
QNX
NXM
XMN
VAX
XGV
IKX
KXY
KEX
EXL
XLY
GQX
QXE
XEP
PLX
XKC
PVX
XKE
RXI
XIR
AXL
XLN
LLX
LXD
XDA
AXE
XEL
GGX
GXG
KAX
XXA
XAG
XWS
SPX
PXC
XCD
GWX
WXH
XHF
MPX
ESX
SXN
XNK
DLX
LXN
XNS
QXG
XGD
ITX
XRG
NEX
EXA
XAL
LDX
DXI
XII
TPX
PXM
XMR
NXG
XGY
ASX
SXV
XVE
TKX
KXA
KRX
XXT
XTL
IDX
DXX
XXL
XLV
AKX
KXX
QHX
HXV
XVN
NSX
SXX
XKX
XDP
DAX
AXK
XKQ
PIX
IXX
XXF
VLX
XDI
DIX
IXL
XLK
LKX
KXV
XVA
DNX
NXD
ILX
LXK
XKV
VYX
YXE
XEI
RXS
XSH
KGX
XGF
AVX
VXY
XYG
HVX
XXI
XID
TVX
XXS
XSA
ENX
NXX
XMD
IIX
XMQ
AEX
EXX
XME
PGX
GXP
XPR
SKX
KXF
XFT
HRX
XSW
PQX
XGR
QQX
VTX
XRP
PSX
SXP
XPL
VGX
GXY
RSX
SXS
XSL
VSX
XST
AXV
XVL
AGX
GXX
XTK
KLX
LXR
XRV
AHX
HXC
XCS
LVX
VXN
XNR
NGX
GXL
TSX
SXQ
XQN
KXL
XLL
VIX
IXG
XGA
GFX
FXG
XGL
PTX
TXT
XTS
EMX
MXQ
SXY
XYA
IQX
QXY
XYR
TXK
IGX
XPS
PXT
XTG
NXQ
VKX
KXS
XSN
GVX
VXE
GRX
XRE
YKX
KXE
XEE
EEX
EXT
XTI
EHX
HXN
XNL
NDX
DXD
IAX
KSX
SXL
RRX
XRK
DDX
DXE
RXG
VXL
XLS
DTX
TXG
VXF
XFA
XIG
VXT
XTA
ISX
SXR
XRY
VQX
QXP
XPC
LGX
GXS
HGX
XGH
XXD
XDD
KKX
XXV
PKX
XLT
XSP
XLD
RAX
AXS
XSI
IYX
YXX
XXP
XPI
MSX
SXT
GEX
XHP
LFX
FXX
VXI
XIW
QTX
TXX
XXQ
XQA
FLX
DXN
XNC
MXS
XSR
YLX
EQX
QXS
TMX
MXC
XCY
NXA
XAV
EXE
XEQ
HPX
PXP
LMX
MXX
KTX
XKK
XXH
XHS
MKX
XIH
WRX
XKS
EXY
XYQ
QKX
9 changes: 6 additions & 3 deletions chebai/preprocessing/datasets/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -728,7 +728,7 @@ def prepare_data(self, *args: Any, **kwargs: Any) -> None:

processed_name = self.processed_main_file_names_dict["data"]
if not os.path.isfile(os.path.join(self.processed_dir_main, processed_name)):
print("Missing processed data file (`data.pkl` file)")
print(f"Missing processed data file (`{processed_name}` file)")
os.makedirs(self.processed_dir_main, exist_ok=True)
data_path = self._download_required_data()
g = self._extract_class_hierarchy(data_path)
Expand Down Expand Up @@ -812,12 +812,15 @@ def setup_processed(self) -> None:
None
"""
os.makedirs(self.processed_dir, exist_ok=True)
print("Missing transformed data (`data.pt` file). Transforming data.... ")
processed_main_file_name = self.processed_main_file_names_dict["data"]
print(
f"Missing transformed data (`{processed_main_file_name}` file). Transforming data.... "
)
torch.save(
self._load_data_from_file(
os.path.join(
self.processed_dir_main,
self.processed_main_file_names_dict["data"],
processed_main_file_name,
)
),
os.path.join(self.processed_dir, self.processed_file_names_dict["data"]),
Expand Down
Empty file.
Loading
Loading