Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement PathFindingOperator #93

Open
Dtenwolde opened this issue Feb 15, 2024 · 3 comments · May be fixed by #135
Open

Implement PathFindingOperator #93

Dtenwolde opened this issue Feb 15, 2024 · 3 comments · May be fixed by #135
Labels
enhancement New feature or request experiment For experimenting existing features path-finding Issues related to path-finding, such as the path mode or path prefix.

Comments

@Dtenwolde
Copy link
Contributor

Dtenwolde commented Feb 15, 2024

This issue serves as a way to track the progress on the PathFindingOperator

Working on in https://github.com/cwida/duckpgq-extension/tree/pathfindingoperator and https://github.com/cwida/duckdb-pgq/tree/pathfindingoperator (Make sure to be on the correct branch in both repositories)

The idea is to create a path-finding operator with two sinks. This acts similarly to the IEJoin. We insert that in this function, instead of the iterativelength() UDF. For this binding phase, we generate a logical query plan, so there cannot be a physical path-finding operator inserted quite yet. We need to create the two sinks here. One side is the src, and dst pairs (tasks) and the other side is the CSR. Importantly without the CREATE_CSR_EDGE() UDF because that will be done in one of the sinks of the new operator.

Can include optimizations such as #23

Plan for now:

  1. Get the CSR as the first sink to this new operator.
  2. Get the (src,dst)-pairs as the second sink.
  3. Implement the path-finding algorithm
  4. Look into how to parallelize.

Potential optimizations:

  1. Duplicate (src,dst)-pairs -> only execute it once, and later blow it up to get all the results again.
  2. Same src many times -> collapses into one src, then fully explore the graph

An example query for what we have for now (initial idea):

SELECT *
FROM pairs AS p
WHERE p.src BETWEEN (select csr_id from (SELECT
            0 as csr_id,
            (SELECT count(a.id) FROM Student a),
            CAST (
                (SELECT sum(CREATE_CSR_VERTEX(0,
                    (SELECT count(a.id) FROM Student a),
                      sub.dense_id,
                      sub.cnt)
                )
                FROM (
                    SELECT a.rowid as dense_id, count(k.src) as cnt
                    FROM Student a
                    LEFT JOIN Knows k ON k.src = a.id
                    GROUP BY a.rowid) sub
            ) AS BIGINT),
            a.rowid,
            c.rowid,
            k.rowid    FROM Knows k
                        JOIN student a on a.id = k.src
                        JOIN student c on c.id = k.dst)) AND p.dst;
@Dtenwolde Dtenwolde added enhancement New feature or request experiment For experimenting existing features path-finding Issues related to path-finding, such as the path mode or path prefix. labels Feb 15, 2024
@Dtenwolde
Copy link
Contributor Author

TODO: Figure out how to include the lower and upper bound into the query

@Dtenwolde
Copy link
Contributor Author

In the current implementation, when doing the following query, it first computes all shortest paths and only then filters out the pairs:

-FROM GRAPH_TABLE (pg
    MATCH p = ANY SHORTEST (a:Person)-[k:Knows]->{2,3}(b:Person)
    WHERE (a.id, b.id) in (SELECT (src, dst) FROM pairs)
    COLUMNS (a.id AS id1, b.id AS id2, element_id(p))
    ) tmp
    ORDER BY tmp.id1, tmp.id2;

It should ideally first do the filter on the pairs, and only then do the shortest path function. This could be a potential optimization rule if we can detect this.

@Dtenwolde Dtenwolde linked a pull request Aug 15, 2024 that will close this issue
@github-actions github-actions bot added the stale label Sep 3, 2024
@Dtenwolde Dtenwolde removed the stale label Sep 3, 2024
@github-actions github-actions bot added the stale label Dec 3, 2024
@Dtenwolde Dtenwolde removed the stale label Dec 3, 2024
@cwida cwida deleted a comment from github-actions bot Dec 5, 2024
@cwida cwida deleted a comment from github-actions bot Dec 5, 2024
@Dtenwolde
Copy link
Contributor Author

select src, list(dst) from pairs group by src;

One way to deduplicate in case there are many duplicate sources.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request experiment For experimenting existing features path-finding Issues related to path-finding, such as the path mode or path prefix.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant