-
Notifications
You must be signed in to change notification settings - Fork 14
[Proposal] Entity List Extensions
This document outlines a proposed extension to the entity list construct.
It is not implemented, and should not be presumed to be supported in any future version.
There are many circumstances in which users want to display a single choice to a user which is dependent on the existence of 0-N items, rather than the existence of a single item.
Imagine the basic construct of "Display all mothers with children who need an upcoming vaccination", where each person (mother/child) is a case type, and all child cases are children of the parent mother case.
In CommCare currently, the way this would need to be displayed would require multiple transforms to be written, for both filtering and display. For example, selecting all child cases which require a vaccination may be expressed as
instance('casedb')/casedb/case[@case_type = 'child'][next_immunization_due < today()]
But getting a list of parent cases which meet this filter requires a significantly more convoluted transform
instance('casedb')/casedb/case[@case_type = 'parent'][
count(
instance('casedb')/casedb/case[@case_type = 'child']
[index/parent = current()/@case_id]
[next_immunization_due < today()]
) > 0]
This transform also requires a highly complex static analysis to bulk fetch child cases to be performant, alternatively, this query will require significant numbers of database lookups to track down each child case.
After the initial filter, any future displays of information which depend on the "Child Set" will also require consistent duplication of the filter set `(type, index/parent, next_immunization_due), which is both error prone and difficult to optimize.
To improve the simplicity, expressiveness, and comprehensibility of this process, the entity list screen could rely on a reduction of the data set based on matching keys.
The incoming data production to the entity list would remain a core nodeset
input[]
which is iterated over a single time to produce the transformed entity objects.
The entity list would support a new construct internally reduce
.
<reduce group-by=''>
...
</reduce>
reduce
requires an input XPath Expression (group-by
) which will produce a scalar value when executed with an item in the nodeset as its context (the same as a or variable).
If a reduce
is present, the entity list will only display one item per unique value in the reduced set. For the rest of this document we will refer to the resultant value as the reduction_id
.
Example: With an input set case[]
and a group function index/parent
nodeset:
instance('casedb')/casedb/case[@case_type = 'child'][filter = 'value']
=> input[]
instance('casedb')/casedb/case[34]
instance('casedb')/casedb/case[50]
instance('casedb')/casedb/case[52]
instance('casedb')/casedb/case[90]
reduce():
instance('casedb')/casedb/case[34]/index/parent => parent_one
instance('casedb')/casedb/case[50]/index/parent => parent_two
instance('casedb')/casedb/case[52]/index/parent => parent_one
instance('casedb')/casedb/case[90]/index/parent => parent_one
The resulting entity list would contain two values, one for each reduction_id
. By default it should be assumed that the first matching result will be what is displayed in the list, so with no other changes the context for the two items used for the <field>
mapping transforms would be
instance('casedb')/casedb/case[34]
instance('casedb')/casedb/case[50]
In addition to allowing the list to essentially be filtered, the reduction block will also allow for fold functions to be defined which can perform a vector -> scalar transform for the items being grouped.
These functions will set a variable value in the entity node
<variable_name base="" fold=""/>
Where base
is an xpath function which will be executed to initialize the variable value for the first nodeset result which matches its reduction_id
, and fold
is an xpath function which will be executed against later nodeset results matching that reduction_id
.
Each variable is then made available to use inside of its fold
function.
Two examples are provided here:
<reduce group-by='index/parent'>
<total_amount base="./amount" fold="$total_amount + ./amount/>
<first_invoice_due base="./invoice_due" fold="min(date($first_invoice_due), date(./invoice_due))"/>
</reduce>
In the first example, $total_amount
is provided for a given reduction_id
by adding to an ongoing accumulating counter. The counter is initialized to '0' to prevent the need for a check within the fold function to init the value.
In the second example, the first_invoice_due
value is set to the smallest date value for any of the cases matching the reduction_id
meet the condition.
Imagine a set of cases exists of type invoice
, where each invoice is recorded against a potentially large list of locations, which are represented by a fixture.
A reasonable task would be to show a list of locations with a count of invoices open against them where only locations with open invoices are included, such as
Location | Open Invoices | Total Amount
---------------------------------------
Central | 3 | $120
West | 1 | $100
North | 2 | $80
would require a setup like
nodeset:
instance('locations')/locations/location[@type = 'client'][count(
instance('casedb')/casedb/case[@case_type = 'invoice'][location_assigned = current()/@id]
) > 0]
Location:
./name
Open Invoices:
count(instance('casedb')/casedb/case[@case_type = 'invoice'][location_assigned = current()/@id])
Total Amount:
sum(instance('casedb')/casedb/case[@case_type = 'invoice'][location_assigned = current()/@id]/total)
Which breaks down to be quite repetitive (and thus easy to commit mistakes on), and also being very, very difficult to optimize, since the app needs to count assigned locations in 3 different places. Each of those counts requires a full walk of the voucher set.
In the new paradigm this could be expressed as
nodeset:
instance('casedb')/casedb/case[@case_type = 'invoice']
reduction step:
group-by: ./location_assigned
folds:
count: count + 1
total: total + ./amount
Location:
instance('locations')/locations/location[@id = $reduction_id]/name
Open Invoices:
$count
Total Amount:
$total
This process only requires a linear walk that is as large as the set of invoices cases.
Since the full list will need to be evaluated at least one time before each folded variable is set, this format will require at least two passes over the full list of values. One pass which occurs over the full input[]
nodeset to produce the folded variables, and then another over the elements of the nodeset which represent their unique reduction_id
.
The <reduce>
block variables will only be available within their own fold
function, or to the entity item itself ( and tags) after the reduce step has completed, they cannot be referenced in their intermediate state by the other variable definitions during the reduce step.
Since it is very likely that there will be a transform from the reduction_id
to another set of models (Say, mapping a list of child cases to a set of parent cases), after the reduce
step completes, it is essential for platform implementations to register $reduction_id
as a model set.
CS: Currently the most important issue here is that we're moving to a world where we have better static assumptions about what data is represented by selections, and this potentially gets us further away. In HQ for instance we'd likely define the nodeset in the short term as a case select over children, but the output case would end up being a parent case, which would be super confusing, and likely needs to be accounted for somewhere in the data structure.
CS: We probably need to allow for a filter over the reduced set transform in this, but it's unclear precisely where it is best applied. Probably after reduction, but potentially would be relevant before :/
In order to maintain a reasonable performance, the variables being used in evaluation will need to be made available as a flavored model set.
IE: in the same way that current()/@case_id is identified as a model set which can have a transform provided against it to ensure that both child and parent cases only do a single db scan currently, it could be desirable for the reduced nodeset to be calculated for all values first (especially for indices) and have that set be retained as a case model set.