Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multilink #296

Merged
merged 18 commits into from
Jan 24, 2024
Merged

Multilink #296

merged 18 commits into from
Jan 24, 2024

Conversation

jreadey
Copy link
Member

@jreadey jreadey commented Jan 8, 2024

Added post method for links and multi support for delete link

@jreadey jreadey requested a review from mattjala January 8, 2024 15:10
hsds/link_dn.py Outdated Show resolved Hide resolved
hsds/link_dn.py Outdated Show resolved Hide resolved
hsds/link_sn.py Outdated Show resolved Hide resolved
hsds/link_sn.py Outdated Show resolved Hide resolved
hsds/servicenode_lib.py Outdated Show resolved Hide resolved
hsds/servicenode_lib.py Outdated Show resolved Hide resolved
hsds/servicenode_lib.py Outdated Show resolved Hide resolved
@mattjala
Copy link
Contributor

mattjala commented Jan 8, 2024

There's a REST VOL url-encoding test with a group named url_encoding_group !*'():@&=+$,?#[]-.<>\\^`{}|~, and it fails with a 404 when trying to re-open the group here after this PR. I'll see if I can track down the exact cause.

@mattjala
Copy link
Contributor

mattjala commented Jan 9, 2024

The issue with URL encoding is actually on the REST VOL's side. This PR fixed an issue that was obscuring it. LGTM for now. Details on the encoding issue are below.

The issue comes from backslashes in the link name. Before this PR, the link name url_encoding_group !*'():@&=+$,?#[]-.<>\^`{}|~ was erroneously truncated before the ? during both the creation of the link and the GET request to check for its existence, so the issue went unnoticed.

After this PR, the link is successfully created with the entire name. When creating the link, the name is provided as in body of a POST_Group request, and its value contains two literal backslash characters. When making the subsequent GET_Link request, the same link name string is provided as part of the destination URL, and is parsed in GET_Domain with four literal backslash characters.

I think this is because backslashes in the URL don't need to be escaped since they are (or should be) already percent-encoded, but backslashes in JSON strings do need to be escaped with another backslash. So when GET_Domain receives two backslashes characters, each is interpreted as a logical backslash character, resulting in four total literal backslashes. The issue is actually on the REST VOL's side, since it should encode strings for JSON properly

mattjala
mattjala previously approved these changes Jan 9, 2024
hsds/util/attrUtil.py Outdated Show resolved Hide resolved
@mattjala
Copy link
Contributor

This looks good - I'll approve once I have a build of the REST VOL working with the new responses.

obj_json["class"] = getObjectClass(obj_id)

json_objs[h5path] = obj_json

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some response parsing callbacks in the REST VOL use hrefs to get information about the target domain, so POST_Domain not returning any hrefs becomes a problem. As a first pass, this solved the problem by attaching hrefs to each returned object individually (though a complete version might need to handle when some objects have different domains due to external links):


    hrefs = []
    hrefs.append({"rel": "self", "href": getHref(request, "/")})
    if "root" in domain_json:
        root_uuid = domain_json["root"]
        href = getHref(request, "/datasets")
        hrefs.append({"rel": "database", "href": href})
        href = getHref(request, "/groups")
        hrefs.append({"rel": "groupbase", "href": href})
        href = getHref(request, "/datatypes")
        hrefs.append({"rel": "typebase", "href": href})
        href = getHref(request, "/groups/" + root_uuid)
        hrefs.append({"rel": "root", "href": href})
        href = getHref(request, "/")
        hrefs.append({"rel": "home", "href": href})

    hrefs.append({"rel": "acls", "href": getHref(request, "/acls")})
    parent_domain = getParentDomain(domain)
    if not parent_domain or getPathForDomain(parent_domain) == "/":
        is_toplevel = True
    else:
        is_toplevel = False

    if not is_toplevel:
        href = getHref(request, "/", domain=parent_domain)
        hrefs.append({"rel": "parent", "href": href})

    for h5path in json_objs:
        (json_objs[h5path])["hrefs"] = hrefs

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've come to think of the hrefs as mainly for human consumption - i.e. clicking on a link that shows up in the browser. As such, hrefs for POST and PUT don't seem as useful.
Another issue is that some of the POST responses could potentially involve a lot of hrefs in the response - e.g. multiple hrefs for each link in POST_links response.
Could you describe what info that you are currently pulling out of the hrefs? How about we just include whatever that is in the POST response as regular json?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The REST VOL gets the domain that contains the object from the hrefs, to satisfy some library API calls that retrieve the object's file number (H5Oget_info3). It stores the hash of the domain name as a file number, so that comparisons between different objects' file numbers still work.

Returning the Domain that contains the object directly could also solve this. The REST VOL uses this information in GET_Datatype/Group/Dataset and POST_Domain requests.

@mattjala
Copy link
Contributor

Besides my two comments, this seems good to go in - I have a draft PR to the REST VOL which works with a version of this PR that incorporates the changes I requested

@jreadey
Copy link
Member Author

jreadey commented Jan 18, 2024

I've checked in some new code to let you return links based on a regex-like pattern. Also added a follow_links paran to GET_Links.

expected_dset_links = ("dset1.1.1", "dset1.1.2", "dset2.1", "dset2.2")
expected_soft_links = ("slink", )
expected_external_links = ("extlink", )
self.assertEqual(len(grp_links), 6)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: It's not clear what this 6 corresponds to, since expected_group_links has 5 entries.
Does grp_links refer to 'links under the targeted group in the request' rather than 'links to groups'? Renaming grp_links would make this more clear.

hrefs = rspJson["hrefs"]
self.assertEqual(len(hrefs), 3)
self.assertTrue("links" in rspJson)
grp_links = rspJson["links"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above about grp_links

@@ -507,6 +507,23 @@ def testGet(self):
expected_uuid = helper.getUUIDByPath(domain, "/g1/g1.2/g1.2.1", session=self.session)
self.assertEqual(expected_uuid, g1_2_1_uuid)

# do get with a regex pattern
params = {"pattern": "ext*"}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section should probably be moved under testGetPattern

@@ -1115,7 +1259,7 @@ def testPostLinkMultiple(self):
obj_links = rspJson["links"]
self.assertEqual(len(obj_links), 6)
expected_group_links = ("g1", "g2", "g1.1", "g1.2", "g1.2.1", )
expected_dset_links = ("dset1.2", "dset2.2", "dset1.1.1", "dset1.1.2", "dset2.1", )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was there a reason to remove dset1.2 from this test? It doesn't cause any issues when added back in.

hsds/dset_lib.py Outdated

# get all chunk ids for chunks that have been allocated
chunk_ids = await getAllocatedChunkIds(app, dset_id, bucket=bucket)
chunk_ids.sort()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Double sort

hsds/link_dn.py Outdated
@@ -109,6 +114,16 @@ async def GET_Links(request):
titles.sort() # sort by key
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

globmatch returns the link titles in arbitrary order, so it should probably be moved before this

ret_links = []
for link in grp_links:
title = link["title"]
if globmatch(title, pattern):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this use of globmatch redundant, since domain_crawl.sn/get_links -> getLinks ->GET_Links (DN) already does the pattern matching?

@jreadey jreadey merged commit 9fa156c into master Jan 24, 2024
9 of 10 checks passed
@jreadey jreadey deleted the multilink branch January 24, 2024 03:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants