Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support textual searching via regular expressions #302

Open
gibsramen opened this issue May 4, 2020 · 3 comments
Open

Support textual searching via regular expressions #302

gibsramen opened this issue May 4, 2020 · 3 comments
Labels
enhancement New feature or request

Comments

@gibsramen
Copy link
Collaborator

Came up during discussion with @fedarko

Might be useful to allow the user to specify a regular expression for which to search the feature space. This should hopefully allow users to specify (almost?) any textual query they like even if it is not explicitly supported in Qurro.

Examples:

Given features k__Bacteria and k__Whackteria and k__Smackteria

  • k__(Bac|Whack)teria should match k__Bacteria and k__Whackteria but not k__Smackteria
  • k__.*teria should match all three
  • k__[^h]*teria should match k__Bacteria and k__Smackteria but not k__Whackteria
  • k__[\w]{5}teria should match k__Whackteria and k__Smackteria but not k__Bacteria

I think this could be done a couple ways - easiest might just be to add a dropdown option for regex filtering.

@gibsramen gibsramen added the enhancement New feature or request label May 4, 2020
@fedarko
Copy link
Collaborator

fedarko commented May 4, 2020

Thanks for opening this! Agreed that this would be really useful. From doing some googling, it looks like it's actually not that bad to convert strings to regex objects in JavaScript:

> var re = new RegExp("k__[^h]*teria");
> "k__Bacteria".match(re);
["k__Bacteria", index: 0, input: "k__Bacteria", groups: undefined]
> "k__Whackteria".match(re);
null

So this should be surprisingly feasible -- the devil will probably be in the details, and making users aware of this. (The most foolproof solution is probably just linking the MDN docs on regular expressions from the selection tutorial, and just say "hey these will be slightly different from python/perl/etc., we recommend going over this".)

@gibsramen
Copy link
Collaborator Author

One thing I just thought of that might also be a wrinkle is unit testing. There are a lot of possibilities for user-input and it's probably not feasible to account for all of them.

@fedarko
Copy link
Collaborator

fedarko commented May 7, 2020

That might be tricky, yeah. I think the main challenge will just be in testing that weird characters like backslashes are being interpreted correctly from the user's input; once that is guaranteed, actually just taking the resulting RegExp object and matching it against strings in the dataset shouldn't be too bad. (...Although I'm realizing now that we'll also have to account for what happens if the user passes in an invalid regular expression string -- for example, strings with unclosed operators like ( or [, either of which currently causes a JS SyntaxError when you try to create a RegExp object from it.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants