Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate Rule graph #493

Merged
merged 3 commits into from
Jun 6, 2023
Merged

Validate Rule graph #493

merged 3 commits into from
Jun 6, 2023

Conversation

ketkarameya
Copy link
Collaborator

@ketkarameya ketkarameya commented Jun 4, 2023

This PR introduces very basic validation for our RuleGraphs. At this moment we make sure that (i) All tree-sitter queries are syntactically correct (ii) Filter fields are configured correctly (this is was something we previously checked)

This PR also ensures that the the graph is validated irrespective of where it is loaded from - toml, Rust API, or Python API (Have added test for each API)

Copy link
Collaborator Author

@ketkarameya ketkarameya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self reviewed

@@ -197,7 +197,7 @@ query = """
(
(prefix_expression (boolean_literal) @exp) @prefix_expression
(#eq? @exp "false")
(#match @prefix_expression "!.*")
(#match? @prefix_expression "!.*")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Found this bug (and the one below) :)

let piranha_args = &self.piranha_arguments;
parser
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

un related to this PR :| , but I observed that i was not using the piranha_args.language() to get the parser

}

if *self.not_contains() != default_not_contains_queries() {
self.not_contains().iter().try_for_each(|x| x.validate())?
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Such ergonomic APIs make me like Rust :) (try_for_each)

Copy link
Collaborator Author

@ketkarameya ketkarameya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self reviewed

@@ -24,3 +24,7 @@ pub(crate) mod rule_graph;
pub(crate) mod rule_store;
pub(crate) mod scopes;
pub(crate) mod source_code_unit;

pub(crate) trait Validator {
fn validate(&self) -> Result<(), String>;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as our validation enriches, I will modify this Type parameter

let mut parser = Parser::new();
parser
.set_language(tree_sitter_query::language())
.expect("Could not set the language for the parser.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand what's going on here. Are we creating a new parser and expecting it to be set to an invalid language? Why? (This at the very least needs documentation)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aah maybe its just bad api naming in rustlang.
We are setting a language, and if the setter throws an exception we panic with "Could not set the language for the parser."

.expect() ~= .orElseThrow() in Java

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. Also, a bit surprised that tree_sitter_query doesn't need any import. For a second I thought this was of the same type piranha_args.language(), i.e. PiranhaLanguage, and was wondering why this wasn't simply tree_sitter_query::language().parser(), but I see the difference now. I still think this all merits a short comment here explaining what this code does (uses tree sitter to construct a parser for its own DSL without all the extensions we have made to Piranha-supported language, correct?)

Copy link
Collaborator Author

@ketkarameya ketkarameya Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aah i see the the source of confusion.
Yes u r correct, we are using ts to parse its own dsl.
I will address this as a comment in the code.

Ya it works with out import because we are referring to the crate directly

.parse(self.get_query(), None)
.filter(|x| number_of_errors(&x.root_node()) == 0)
.map(|_| Ok(()))
.unwrap_or(Err(format!("Cannot parse - {}", self.get_query())))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we report what exactly we failed to parse? i.e. for the missing parenthesis "Unexpected end of query string, unmatched ( at position x:y" or similar?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this moment we do not report why the query is incorrect (or any diagnostics around it)
I will add diagnostics for this as a follow up. I think it just reporting error does not make sense, without diagnostics. Tracking this issue #494

fn validate(&self) -> Result<(), String> {
match self.rules().iter().try_for_each(|rule| rule.validate()) {
Ok(()) => Ok(()),
Err(e) => Err(format!("Incorrect Rule Graph - {}", e)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question: Do we report the first failing rule or all syntax errors in every rule in the rule graph? (Neither would be right or wrong, specially at this point, just want to make sure. I'd think it's the first case looking at this, but nor sure what the semantics of try_for_each are.

Copy link
Collaborator Author

@ketkarameya ketkarameya Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are short-circuiting at the first exception.
https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.try_for_each

Thinking about it, I should eventually accumulate these errors and report them.
Expecting someone to fix them One-by-one is cruel

Copy link
Collaborator Author

@ketkarameya ketkarameya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replies inline

let mut parser = Parser::new();
parser
.set_language(tree_sitter_query::language())
.expect("Could not set the language for the parser.");
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aah maybe its just bad api naming in rustlang.
We are setting a language, and if the setter throws an exception we panic with "Could not set the language for the parser."

.expect() ~= .orElseThrow() in Java

.parse(self.get_query(), None)
.filter(|x| number_of_errors(&x.root_node()) == 0)
.map(|_| Ok(()))
.unwrap_or(Err(format!("Cannot parse - {}", self.get_query())))
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this moment we do not report why the query is incorrect (or any diagnostics around it)
I will add diagnostics for this as a follow up. I think it just reporting error does not make sense, without diagnostics. Tracking this issue #494

fn validate(&self) -> Result<(), String> {
match self.rules().iter().try_for_each(|rule| rule.validate()) {
Ok(()) => Ok(()),
Err(e) => Err(format!("Incorrect Rule Graph - {}", e)),
Copy link
Collaborator Author

@ketkarameya ketkarameya Jun 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are short-circuiting at the first exception.
https://doc.rust-lang.org/std/iter/trait.Iterator.html#method.try_for_each

Thinking about it, I should eventually accumulate these errors and report them.
Expecting someone to fix them One-by-one is cruel

Copy link
Collaborator Author

@ketkarameya ketkarameya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replies inline

Copy link
Collaborator Author

@ketkarameya ketkarameya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replies inline

Copy link
Collaborator Author

@ketkarameya ketkarameya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replies inline

Copy link
Collaborator Author

@ketkarameya ketkarameya left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replies inline

Copy link
Contributor

@lazaroclapp lazaroclapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above about documenting the tree_sitter_query::language() bit, but otherwise this LGTM.

@ketkarameya ketkarameya merged commit 5238295 into master Jun 6, 2023
@ketkarameya ketkarameya deleted the validate_rule_graphs branch July 13, 2023 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants