Multilink #296

jreadey · 2024-01-08T15:10:26Z

Added post method for links and multi support for delete link

hsds/link_dn.py

hsds/link_sn.py

hsds/servicenode_lib.py

mattjala · 2024-01-08T16:43:49Z

There's a REST VOL url-encoding test with a group named url_encoding_group !*'():@&=+$,?#[]-.<>\\^`{}|~, and it fails with a 404 when trying to re-open the group here after this PR. I'll see if I can track down the exact cause.

mattjala · 2024-01-09T18:59:12Z

The issue with URL encoding is actually on the REST VOL's side. This PR fixed an issue that was obscuring it. LGTM for now. Details on the encoding issue are below.

The issue comes from backslashes in the link name. Before this PR, the link name url_encoding_group !*'():@&=+$,?#[]-.<>\^`{}|~ was erroneously truncated before the ? during both the creation of the link and the GET request to check for its existence, so the issue went unnoticed.

After this PR, the link is successfully created with the entire name. When creating the link, the name is provided as in body of a POST_Group request, and its value contains two literal backslash characters. When making the subsequent GET_Link request, the same link name string is provided as part of the destination URL, and is parsed in GET_Domain with four literal backslash characters.

I think this is because backslashes in the URL don't need to be escaped since they are (or should be) already percent-encoded, but backslashes in JSON strings do need to be escaped with another backslash. So when GET_Domain receives two backslashes characters, each is interpreted as a logical backslash character, resulting in four total literal backslashes. The issue is actually on the REST VOL's side, since it should encode strings for JSON properly

hsds/util/attrUtil.py

mattjala · 2024-01-12T18:17:34Z

This looks good - I'll approve once I have a build of the REST VOL working with the new responses.

hsds/domain_sn.py

mattjala · 2024-01-17T19:49:19Z

hsds/domain_sn.py

+        obj_json["class"] = getObjectClass(obj_id)
+
+        json_objs[h5path] = obj_json
+


Some response parsing callbacks in the REST VOL use hrefs to get information about the target domain, so POST_Domain not returning any hrefs becomes a problem. As a first pass, this solved the problem by attaching hrefs to each returned object individually (though a complete version might need to handle when some objects have different domains due to external links):

hrefs = [] hrefs.append({"rel": "self", "href": getHref(request, "/")}) if "root" in domain_json: root_uuid = domain_json["root"] href = getHref(request, "/datasets") hrefs.append({"rel": "database", "href": href}) href = getHref(request, "/groups") hrefs.append({"rel": "groupbase", "href": href}) href = getHref(request, "/datatypes") hrefs.append({"rel": "typebase", "href": href}) href = getHref(request, "/groups/" + root_uuid) hrefs.append({"rel": "root", "href": href}) href = getHref(request, "/") hrefs.append({"rel": "home", "href": href}) hrefs.append({"rel": "acls", "href": getHref(request, "/acls")}) parent_domain = getParentDomain(domain) if not parent_domain or getPathForDomain(parent_domain) == "/": is_toplevel = True else: is_toplevel = False if not is_toplevel: href = getHref(request, "/", domain=parent_domain) hrefs.append({"rel": "parent", "href": href}) for h5path in json_objs: (json_objs[h5path])["hrefs"] = hrefs

I've come to think of the hrefs as mainly for human consumption - i.e. clicking on a link that shows up in the browser. As such, hrefs for POST and PUT don't seem as useful.
Another issue is that some of the POST responses could potentially involve a lot of hrefs in the response - e.g. multiple hrefs for each link in POST_links response.
Could you describe what info that you are currently pulling out of the hrefs? How about we just include whatever that is in the POST response as regular json?

The REST VOL gets the domain that contains the object from the hrefs, to satisfy some library API calls that retrieve the object's file number (H5Oget_info3). It stores the hash of the domain name as a file number, so that comparisons between different objects' file numbers still work.

Returning the Domain that contains the object directly could also solve this. The REST VOL uses this information in GET_Datatype/Group/Dataset and POST_Domain requests.

mattjala · 2024-01-17T22:11:41Z

Besides my two comments, this seems good to go in - I have a draft PR to the REST VOL which works with a version of this PR that incorporates the changes I requested

jreadey · 2024-01-18T13:15:11Z

I've checked in some new code to let you return links based on a regex-like pattern. Also added a follow_links paran to GET_Links.

mattjala · 2024-01-18T16:01:40Z

tests/integ/link_test.py

+        expected_dset_links = ("dset1.1.1", "dset1.1.2", "dset2.1", "dset2.2")
+        expected_soft_links = ("slink", )
+        expected_external_links = ("extlink", )
+        self.assertEqual(len(grp_links), 6)


Nitpick: It's not clear what this 6 corresponds to, since expected_group_links has 5 entries.
Does grp_links refer to 'links under the targeted group in the request' rather than 'links to groups'? Renaming grp_links would make this more clear.

mattjala · 2024-01-18T16:02:16Z

tests/integ/link_test.py

+        hrefs = rspJson["hrefs"]
+        self.assertEqual(len(hrefs), 3)
+        self.assertTrue("links" in rspJson)
+        grp_links = rspJson["links"]


Same comment as above about grp_links

mattjala · 2024-01-18T16:04:29Z

tests/integ/link_test.py

@@ -507,6 +507,23 @@ def testGet(self):
        expected_uuid = helper.getUUIDByPath(domain, "/g1/g1.2/g1.2.1", session=self.session)
        self.assertEqual(expected_uuid, g1_2_1_uuid)

+        # do get with a regex pattern
+        params = {"pattern": "ext*"}


This section should probably be moved under testGetPattern

mattjala · 2024-01-18T16:27:28Z

tests/integ/link_test.py

@@ -1115,7 +1259,7 @@ def testPostLinkMultiple(self):
        obj_links = rspJson["links"]
        self.assertEqual(len(obj_links), 6)
        expected_group_links = ("g1", "g2", "g1.1", "g1.2", "g1.2.1", )
-        expected_dset_links = ("dset1.2", "dset2.2", "dset1.1.1", "dset1.1.2", "dset2.1", )


Was there a reason to remove dset1.2 from this test? It doesn't cause any issues when added back in.

mattjala · 2024-01-18T20:14:28Z

hsds/dset_lib.py

+
+    # get all chunk ids for chunks that have been allocated
+    chunk_ids = await getAllocatedChunkIds(app, dset_id, bucket=bucket)
+    chunk_ids.sort()


Double sort

mattjala · 2024-01-18T20:23:24Z

hsds/link_dn.py

@@ -109,6 +114,16 @@ async def GET_Links(request):
        titles.sort()  # sort by key


globmatch returns the link titles in arbitrary order, so it should probably be moved before this

mattjala · 2024-01-18T20:29:45Z

hsds/link_sn.py

+                ret_links = []
+                for link in grp_links:
+                    title = link["title"]
+                    if globmatch(title, pattern):


Is this use of globmatch redundant, since domain_crawl.sn/get_links -> getLinks ->GET_Links (DN) already does the pattern matching?

jreadey added 2 commits January 5, 2024 05:12

refactor link dn

634e6b1

post method for links

acec4be

jreadey requested a review from mattjala January 8, 2024 15:10