-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Link queue wrapper to record occupancy information in a file #129
Link queue wrapper to record occupancy information in a file #129
Conversation
@@ -19,7 +27,7 @@ export class ActorExtractLinksAll extends ActorExtractLinks { | |||
return { | |||
links: await ActorExtractLinks.collectStream(action.metadata, (quad, links) => { | |||
for (const link of getNamedNodes(getTerms(quad))) { | |||
links.push({ url: link.value }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current approach with annotateLinkWithTheReachabilityCriteria
and reachabilityLabel
seems a bit complicated.
Could we just do something like links.push({ url: link.value, metadata: { actorExtractLinks: this.name } })
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not using the name because some actors such has predicate have multiple common variation (common, ldp, solidStorage) and I felt this was it was capture better. Another example would be the type index, maybe we would want to indicate that it is with inference or not or with other actor with notable properties.
Also I wanted to make the annotation optional hence why it is inside a method to not add to much if statement and since it will be same everywhere I placed it inside the abstract class.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some actors such has predicate have multiple common variation (common, ldp, solidStorage)
Perhaps those can be passed as additional metadata fields?
The main concern I have with this approach is that all actors have a new (optional) param now, and there is semantic redundancy with the name of the actor.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I could create a reachability criteria object documenting the specificity of the criteria (with custom equal function for when you care about the "major" reachability type or the full reachability).
For the optional new parameters, the reason it is there is to not modify the previous behavior where the metadata was empty. I defined this new parameter in the bus/abstract class to not disturbed too much the current and future implementation. I could make it so it is always in the links.
/** | ||
* @param args - @defaultNested {<default_bus> a <cc:components/Bus.jsonld#Bus>} bus | ||
*/ | ||
public constructor(args: IActorExtractLinksArgs) { | ||
super(args); | ||
this.labelLinksWithReachability = args.labelLinksWithReachability ?? false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure we need this one. I guess we could just use the actor name this.name
?
Before I review in detail, could you @RubenEschauzier first have a look at this as well? |
I don't know why the CI doesn't pass on my end it does. Even the solidbench the one which is the one failing. |
The failure looks unrelated to your PR. |
Another thought I have that would simplify this PR: Instead of writing to file, we could just write to the logger (e.g. at TRACE level). An external script could then be used to filter the log messages related to the link queue. This would make this PR work in the browser as well. |
Looks pretty good, it is a much more refined version of what I used for the link queue analysis paper. I'm unsure if it has much to do with prioritization, as (as far as I can tell) it is mainly for analyzing query execution in hindsight. |
I think it is the part about the labeling of the links with the reachability criteria. At least for my work I plan to use those label to prioritize some links. |
Yes, I think it is much better! I will be also able to provide the information in a streaming manner (instead of dumping it all at every step, but I think I will provide the option to do so). I will make another repo to parse this information has a file because it make it easy for my own investigation and debugging. |
Objective
The objective of this actor is to collect information about the link queue and to serialize it into a file. The information collected are the following:
It is also possible to add custom link queue properties via the
IOptionalLinkQueueParameters
, but for their parsing the code has to be modified.A label for each current reachability criteria also has been added. They can optionally label each link coming out of the extract-link actors. This feature doesn't only have the purpose to document the link queue occupancy but can be use in future work for engine optimization such has link queue ordering.
The
@comunica/actor-rdf-resolve-hypermedia-links-queue-wrapper-info-occupance:query-identifier
property can also be used to identify the query.Example output
Config of the wrapper actor
Config of the engine
output
If an identifier was given for the query than the
query
field will be this identifier.Future works