Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query: rework as attrs-based model #1

Open
synrg opened this issue Jan 2, 2023 · 4 comments
Open

query: rework as attrs-based model #1

synrg opened this issue Jan 2, 2023 · 4 comments
Assignees

Comments

@synrg
Copy link
Contributor

synrg commented Jan 2, 2023

After studying pyinaturalist's models, I'd like Query to be based on pyinaturalist.base.models, as they provide robust abstractions that improve on what we've made so far. While pyinat does not have a user's query as a concept distinct from the API requests that would be needed to fulfill them, it does have core classes that we can use to make one.

A Query is not, itself, a description of a single API request. It is a text description of a number of parameters sent to iNaturalist to produce a single display with two parts, the description of the base entity of the request, and zero or more individual results relating to that entity (e.g. total # of observations for the query, and per-observer counts of observations and species for the query).

For example, a fungi by me from ns in prj lichens atlantic on today request could be realized as pyinat RequestParams with individual params as follows (slightly simplified for illustration purposes):

  • taxon = Taxon(id=47170)
    • the taxon here is the first record returned from /v1/taxa/autocomplete?q=fungi
  • user = User(id=545640)
    • the special keyword me refers to user's own id
  • place = Place(id=6853)
    • ns is looked up in a table associated with the user's command context
  • project = Project(id=62291)
    • lichens atlantic is the first matching record from /v1/projects/autocomplete?q=lichens+atlantic
  • observed_on = datetime.today()
    • today is parsed via dateparser.parse()

For each of the entities retrieved from local tables, only a partial object is needed, just so that place.id, etc. will work.

In the dronefly codebase, this "fully parsed" query is called a QueryResponse (which I'm not entirely happy with). It is still, however, only a template for one or more primary requests for the page to fill it with content.

Which requests are performed depend on what command handles the query. For example, in this simplified rendering of a taxon display with the above query arguments:

Kingdom Fungi (Fungi Including Lichens)
is a kingdom with 52 observations in Lichens of Atlantic Canada from Nova Scotia, CA observed on Jan 2, 2023.
obs# (spp#) by user:
3 (2) benarmstrong

Several distinct API requests based on the query would be needed to fill in all the parts above including at least:

  • /v1/observations?taxon_id=47170&user_id=545740&place_id=6853&project_id=62291&verifiable=any&observed_on=2023-01-02&per_page=0
    • to get total_results to put in 52 observations
  • /v1/observations?taxon_id=47170&user_id=545740&place_id=6853&project_id=62291&verifiable=any&observed_on=2023-01-02&user_id=545640&per_page=0
    AND
    /v1/observations/species_counts?taxon_id=47170&user_id=545740&place_id=6853&project_id=62291&verifiable=any&observed_on=2023-01-02&user_id=545640&per_page=0
    • to get "3 (2)"
  • with luck, the User record for benarmstrong is already cached, otherwise a /v1/users/545740 might be needed to obtain this from user.login
  • not to mention also the Place and Project API requests to obtain their names as well (though they, too, might already be cached)

Finally, it should be possible to map between a command with query argument to a URL to the web page that best represents that base request, and any other parts of the page (usually counts of each entity which link to searches for those entities on the web):

With all this in mind, the Query class should represent all of these arguments in a way that more closely resembles existing pyinaturalist models.

Here is a representation of this progression from text to parsed query to a validated query that is finally ready to be used in a command as dict-like results from each step:

>>> query = Query.parse("fungi by me from ns in prj lichens atlantic on today")
{
  "taxon": "birds",
  "user": "me",
  "place": "ns",
  "project": "lichens atlantic",
  "observed_on": "today",
}
>>> validated_query = query.validate()
{
  "taxon": { id: 3 },
  "user": { id: 545640 },
  "place": { id: 6853 },
  "project": { id: 15702 },
  "observed_on": "2023-01-02",
}

I'm still not sure of QueryResponse vs. some better name. Maybe ValidatedQuery?

@synrg
Copy link
Contributor Author

synrg commented Nov 21, 2024

I think "validation" is the wrong word. Really, this is just stages of evaluation in a sort of assembly-line fashion. The Query itself is mostly just an expression of a set of filters to apply to the result set and not fully realizable until used in a command which provides the "verb", as well as determining which presentation is appropriate for the results e.g. "search" (implictly observations) to produce a result set containing all matching observations, in a paginated display presentation.

While fully developed classes for these concepts aren't yet firm, this is a rough outline of the parts of a command as they stand today:

  • A Dronefly Command is a request for a result set containing the entity or entities asked for by the Command (verb) and qualified by one or more preposition+noun phrases that make up the Query.
  • Those phrases are mostly filters to narrow down which subset of entities are shown, but also may contain presentation qualifiers, such as "sort by obs" and "desc" that alter the sort order.
  • The QueryResponse is an intermediate object that contains one or more preposition + noun phrases (modelled as option + argument pairs). At this point, everything in the Command string that needed to be looked up has been looked up. Everything awaits further processing to produce the final result set.
  • We might call such a thing more accurately a QueryResults, except it is not yet the full results that the user asked for. That will only finally be obtained by applying the Command verb to it.

That might not be such a bad name after all, QueryResults, until it is made concrete via using it in a Command, so:

query = Query.parse('fungi by me from ns in prj lichens atlantic on today")
taxon_command = Command("taxon")
counts_command = Command("taxon counts")
query_results = query.prepare() # => QueryResults
taxon = taxon_command.realize(query_results) # => Taxon
taxon_counts = counts_command.realize(query_results) # => TaxonCounts
menu = TaxonMenuWithCounts(taxon_results=taxon_result, taxon_counts_list=[taxon_counts])
menu.start()

This example captures the fact that the resulting display has a primary result output (the taxon) followed by the table below it. Therefore, it is actually two commands in one, each taking the prepared results as input.

@synrg
Copy link
Contributor Author

synrg commented Nov 21, 2024

Alternatively, I might prefer if we called it a PreparedQuery which shifts emphasis off of the "results" aspect of it (i.e. these aren't yet the final result, but an intermediate stage to fetching them), and thus:

query = Query.parse('fungi by me from ns in prj lichens atlantic on today")
taxon_command = Command("taxon")
counts_command = Command("taxon counts")
prepared_query = query.prepare() # => PreparedQuery
taxon = taxon_command.realize(prepared_query) # => Taxon
user_or_place_taxon_counts = counts_command.realize(prepared_query) # => Union[UserTaxonCounts, PlaceTaxonCounts]
menu = TaxonMenuWithCounts(taxon=taxon, counts=[user_or_place_taxon_counts])
menu.start()

@synrg synrg closed this as completed Nov 21, 2024
@synrg synrg reopened this Nov 21, 2024
@synrg
Copy link
Contributor Author

synrg commented Nov 21, 2024

Oops, misclick. Still not happy with this. Will return to it later. "realizing" a command seems wrong. perhaps the two inputs to the menu are just results of functions applied to the prepared_query instead.

@synrg
Copy link
Contributor Author

synrg commented Nov 21, 2024

First, I think I shouldn't buck convention and should keep Query at the front, so QueryMappedEntities is the best I have so far as a replacement for QueryResponse. The prefix "Query" helps it collate, making it easier to find and strengthening its relatedness with the original Query, "MappedEntities" focuses on the remapping of bits of text in the query to at least partial Models and more precise qualifiers like expanding "today" to the date today, expanding macros, etc.

This refinement of my earlier ideas above relies far fewer new Dronefly objects and instead directly makes use of existing pyinat TaxonCounts and TaxonCount models. What follows also serves as a bit of an overview of how Command, Context, Query, QueryMappedEntities, Source, and Menu should fit together to form the overall structure of most Dronefly commands. Much of this is already written (at least partially), but perhaps not everything here is fully articulated elsewhere.

OK. I'm back to treating taxon command as a single command. We don't need a taxon counts separate command, just some preparatory steps to add the counts to the data source passed to the front-end. The taxon command body proceeds, bucket-brigade style, through three stages. It starts with parsing the query & preparing it*, then packages up the specific arguments for the front-end, then starts the front-end (the menu) so the user can see the results and interact with them. By the time it gets to the menu start, everything necessary to produce the display should either already be looked up, or else is wrapped in a generator that will fetch them as needed a page at a time, often with a pyinat Paginator at the bottom layer. However, in this example, the list starts empty or with just one entity, and can grow or shrink as users their own stats via menu button-presses.

* Certain real-world aspects are left out of the following to keep it simple for illustrative purposes, e.g. details for the creation of the Context and Command objects, and parsing and preparation of the query would be handled within a context manager that provides ctx to the command block.

Here's what this simplified restructuring of our current taxon command might look like:

from dronefly.core.commands import Command, Context
from dronefly.core.query import Query
from dronefly.core.menus import TaxonCountsSource
from dronefly.discord.menus import TaxonWithCountsMenu

ctx = Context()
ctx.command = Command("taxon")
try:
    # parsing and preparation:
    query = Query.parse('fungi by me from ns in prj lichens atlantic on today")
    ctx.query_entities = ctx.command.prepare(query) # => QueryMappedEntities

    # get menu arguments:
    taxon = ctx.query_entities.taxon() # => single Taxon that best matches the query
    taxon_count_type = ctx.command.preferred_count_type(ctx.query_entities) # => Union[Type[Place], Type[User], None]
    if taxon_count_type is None:
        counts = None
    else:
        counts = get_taxon_counts(ctx.query_entities, taxon_count_type) # => collection of TaxonCounts for counted type
    counts_source = TaxonCountsSource(ctx, taxon=taxon, taxon_counts=taxon_counts, taxon_count_type=taxon_count_type)

    # start the menu:
    menu = TaxonWithCountsMenu(ctx.cog, taxon=taxon, counts_source=counts_source)
    menu.start(ctx)
except:
    # error handling for malformed query, no matching taxon, no matching place, user, etc.

A bit of logic that wasn't designed to my satisfaction in the current ,taxon command is split up here into command.preferred_count_type() to tell us whether we're counting users or places and get_taxon_counts(), a general-purpose helper method that will count either entity, taking just the query_entities and the taxon_count_type as arguments. For instance, as per our current ,taxon command behaviour, if the query_entities contains a place or user, then taxon_count_type = Place or = User respectively, and that determines whether counts are fetched initially from the API and if so, which kind. If both are specified, then which one is prioritized depends on the command: for ,taxon it will be User, but other commands may differ, justifying making preferred_count_type() a method (or attribute) of the command, not the query_entities.

Now we have the few different data items that the menu will operate on: taxon, taxon_counts, and taxon_count_type. These are bundled together in a TaxonCountsSource that is passed to the menu upon creation. That allows the menu to be written fairly generically, improving reuse in different commands. It shouldn't even need to consult the query_entities now, since the source already has everything it wanted from it. The source presents to the menu an interface for retrieving the main content of the taxon display (name, conservation status, etc,), paging through a list of user or place taxon counts associated with it, providing methods to update that list that can be attached to menu buttons, etc. The menu itself doesn't need to know how the source provides all of this info, or know even if any additional filters came into play (date/time, place, etc.) It just asks the source for everything it needs to pour into the view, and apart from that can be written to be fairly general, thus cutting down on the amount of custom code written per command.

With this arrangement, front-end UI elements like buttons can even be provided that are attached to handlers in the source that update the query, as in our ,life command where a new root taxon is chosen, or an entirely different tree generated based on different rank filters, etc. All of this is working in the current codebase for ,life but other commands like ,taxon haven't yet received this treatment. This issue has been one of the blockers.

@synrg synrg self-assigned this Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant