Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizations #114

Open
6 of 9 tasks
nsbgn opened this issue Jul 15, 2023 · 0 comments
Open
6 of 9 tasks

Optimizations #114

nsbgn opened this issue Jul 15, 2023 · 0 comments

Comments

@nsbgn
Copy link
Contributor

nsbgn commented Jul 15, 2023

SPARQL can only go so far, so we probably need to be smarter to make this properly scalable. That said, for https://github.com/quangis/quangis-workflow, we need to get rid of memory errors asap, and so:

  • Be smarter about selecting on the bag-of-types by only keeping the most specific types and not having SPARQL unions unless absolutely necessary (implemented as of e0a1a7d, 30bc861, 499d255, but poorly thought through, poorly implemented and poorly tested). In essence, we're now removing some constraints that are already guaranteed to hold in the presence of other constraints.
  • Also eliminate pointless UNIONs in ordered data (671de1f, a258e93)
  • Order the bag-of-types such that the most specific types come first
  • Use subqueries to limit ordered queries
  • Seperate :contains predicates for types and operators, so that the search is more directed.
  • Annotating the transformation graphs as directly as possible (saving all supertypes of each conceptual step on the step itself), so that we can do ?workflow :containsType <A> and ?step :subtypeOf B instead of, respectively, ?workflow :containsType ?A. ?A rdfs:subClassOf* <A> and ?step :type ?B. ?B rdfs:subClassOf* <B>. This is the biggest improvement.
  • Every step in the transformation graph should record from which steps it is reachable/which steps it depends on. Then we don't need property paths and can just select on type, select on reachability, done. Should make for another huge improvement. (Note: if we were using trees, we could also record the "path" on each step, but that gets exponential for DAGs)
  • Given the above, we can drop steps that themselves depend on other steps that match.
  • Record distance from output on every step. That way, we can force breadth-first search even on SPARQL.
nsbgn added a commit that referenced this issue Jul 22, 2023
nsbgn added a commit that referenced this issue Jul 23, 2023
Attempt at optimization, #114
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant