Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_latest_source_version() being called for data sources that aren't needed for graph being built. #236

Open
DnlRKorn opened this issue Jul 23, 2024 · 1 comment
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@DnlRKorn
Copy link
Contributor

If you execute build_manager on the Testing_Baseline
python build_manager.py Testing_Baseline
for a slightly modified version of testing-graph-spec.yml
We see the following output

2024-07-23 11:49:46,328 - get_latest_source_version(): Retrieving latest source version for CTD...
2024-07-23 11:49:46,518 - get_latest_source_version(): Found latest source version for CTD: June_2024
2024-07-23 11:49:46,623 - get_latest_source_version(): Retrieving latest source version for GtoPdb...
2024-07-23 11:49:47,735 - get_latest_source_version(): Found latest source version for GtoPdb: 2024.2
2024-07-23 11:49:47,736 - build_graph(): Building graph Testing_Baseline. Checking dependencies...
2024-07-23 11:49:47,738 - build_graph(): Building graph Testing_Baseline. Dependencies are ready...

We see that GtoPdb gets it's latest source version established despite not being used for the Testing_Baseline graph_spec.
Digging into this deeper; here is a traceback from when get_latest_source_version() is called on GtoPdb

    graph_builder = GraphBuilder()
  File "/home/dkorn/BUILD_COMPARE/ORION/build_manager.py", line 41, in __init__
    self.graph_specs = self.load_graph_specs()  # list of graphs to build (GraphSpec objects)
  File "/home/dkorn/BUILD_COMPARE/ORION/build_manager.py", line 314, in load_graph_specs
    return self.parse_graph_spec(graph_spec_yaml)
  File "/home/dkorn/BUILD_COMPARE/ORION/build_manager.py", line 339, in parse_graph_spec
    data_sources = [self.parse_data_source_spec(data_source) for data_source in graph_yaml['sources']] \
  File "/home/dkorn/BUILD_COMPARE/ORION/build_manager.py", line 426, in parse_data_source_spec
    else self.source_data_manager.get_latest_source_version(source_id)
  File "/home/dkorn/BUILD_COMPARE/ORION/Common/load_manager.py", line 129, in get_latest_source_version
    if source_id in self.latest_source_version_lookup:

If a graph_spec is in the yaml file read by build_manager.py, it will be parsed (self.graph_specs = self.load_graph_specs() and return self.parse_graph_spec(graph_spec_yaml)). The potential issue with this is that get_latest_source_version can be be called (else self.source_data_manager.get_latest_source_version(source_id)), even if it's not necessary for the specific graph_id being built. Calling this is probably out of scope for the parser as it often requires downloading the current version of the file.

@DnlRKorn DnlRKorn added bug Something isn't working help wanted Extra attention is needed labels Jul 23, 2024
@EvanDietzMorris
Copy link
Contributor

This is point number 3 of #227 .. it's definitely nasty and would be easy to fix but haven't gotten around to it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants