The proposal is about using Dhall to combine small configuration files into one file.
The Dhall features that we use in this document:
Dhall records are like Javascript objects, with slightly different syntax, replacing :
with =
. For example,
{
name = "AvgDailyPM2.5Exposure",
mapping = {
dataset = "env",
column = "pm2.5"
}
}
We can define types for records, Dhall checks the type to ensure that a value is well-typed. This allows us to rule out type errors such as missing fields, misspelt fields, and fields with wrong type in records. Each field can have a primitive type such as Text
, Bool
, Integer
, Number
or a record type.
For example we can define:
{
name: Text,
mapping: {
dataset: Text,
column: Text
},
}
The :
indicates that this is a record type.
We can name a definition and use it in another definition.
let pm25mapping = {
dataset = "env",
column = "pm2.5"
}
in {
name = "AvgDailyPM2.5Exposure",
mapping = pm25mapping
}
Here we name record { dataset = "env", column = "pm2.5" }
pm25mapping
, and use it in our definition.
Dhall allows defining enums in the form of union types. Each alternative is allowed to have an optional value of a type. For example,
let BinningStrategy = < Cut : Integer | QCut : Integer | NoBinning >
We define a type for binning strategies. Cut
allows an integer number of binning. Same for QCut
. NoBinning
doesn't need any. For example, a value of this type can be Cut 4
, Cut 5
, QCut 4
, or NoBinning
.
Dhall allows defining functions similar to Javascript's lambda. Instead of using the x => ...
syntax, it uses the \x : t -> ...
syntax where t
is the type of x
. For example,
let cut = \ x : Integer -> Cut x
Parentheses are not needed to apply a function to a value. For example,
cut 4
For each ICEES variables A
, create a A.dhall
file.
let FeatureVariable = {
name: Text, -- variable name
mapping: Mapping, -- mapping
binning_strategies: Optional (List BinningStrategy), -- binning strategies
feature: ICEESFeature, -- ICEES feature
identifiers: Identifiers -- identifiers
}
Mapping
is defined as follow:
let Mapping = < GenericFHIRMapping | SpecializedFHIRMapping | EnvironmentalMapping | GEOIDMapping | NearestPointMapping | NearestFeatureMapping >
Currently there are six main sources FHIR-PIT maps feature variables:
- Generic FHIR mapping (Condition, Procedure, MedicationRequest, Observation)
- Specialized FHIR mapping (demographic, visit)
- Environmental data
- From geoid mapping (ACS)
- From lat, lon to nearest point (CAFO, landfill)
- From lat, lon to nearest feature (nearest road)
We define the following six record types:
let GenericFHIRMapping = {
resource: Text, -- FHIR resource type
system: Text, -- FHIR system
code: Text, -- FHIR code
system_is_regex: Optional Bool, -- system contains a regular expression
code_is_regex: Optional Bool -- code contains a regular expression
}
For generic FHIR mappings, we match data to ICEES feature variables according to FHIR resource, system, and code. By default, if either system
or code
contains *
, it is considered as a glob, which matches zero or more characters. This allows us to map, for example, ICD 10 codes such as E50.*
. This behavior can be modified by setting the optional system_is_regex
or code_is_regex
fields. When they are set to True
, the system and code are parsed as regular expressions. When they are set to False
, the system and code are not parsed as regular expressions or globs.
let SpecializeFHIRMapping = < Visit: FHIRMappingVisit | Age | Race | Sex | Ethnicity | Weight | Height | BMI >
let FHIRMappingVisit = List Text -- a list of diagnoses for filtering visits that count towards the ICEES feature variable
We enumerate specialized FHIR mappings.
let EnvironmentalMapping = {
dataset: Text, -- the dataset where the data come from
column: Text, -- the column in the dataset
statistics: List StatisticVariant -- the statistics of the data that we calculate
}
let StatisticVariant = {
statistic: Statistic, -- the statistic to calculate
rename: Optional RenameAs -- optional rename of feature variable for referencing this statistic
}
let Statistic = < Max | Min | Avg | StdDev | PrevDate >
let RenameAs = < Suffix: Text | Replace: Text >
For environmental mapping, we match data from a table. We allow multiple datasets. Each dataset is referenced by its name. A column in the dataset is specified for mapping to the ICEES feature variable. A list of statistics is specified to calulate the feature variables. We support the following statistics: maximum, minimum, average, standard deviation, and previous date. Previous date is used for calculating the value the day before an encounter.
let GEOIDMapping = {
dataset: Text, -- the dataset where the data come from
column: Text, -- the column in the dataset
datatype: FeatureType -- the type of the data
}
let FeatureType = < string | number >
For geoid mapping, we match data from a table, with one column being the geoid, and other columns being the features. We allow multiple datasets, for example, ACS and ACS ur. Each dataset is referenced by its name. A column in the dataset is specified for mapping to the ICEES feature variable. A type is specified to parse the data.
let NearestPointMapping = < FeatureAttribute : FeatureAttribute | Distance : Distance >
let FeatureAttribute = {
dataset: Text, -- the dataset where the data come from
name: Text, -- feature attribute name
datatype: FeatureType -- the type of the data
}
let Distance = {
dataset: Text, -- the dataset where the data come from
maximum: Double, -- the maximum distance
}
For nearest point mapping, we match data from a shapefile. We allow mapping a special ICEES feature variable from the distance. For the nearest point, we allow mapping attributes to ICEES feature variables.
let NearestFeatureMapping = < FeatureAttribute : FeatureAttribute | Distance >
let FeatureAttribute = {
dataset: Text, -- the dataset where the data come from
name: Text, -- feature attribute name
datatype: FeatureType -- the type of the data
}
For nearest feature mapping, it is similar to nearest point mapping.
ICEES features are defined as follows:
let ICEESFeature = {
feature_type: ICEESAPIType, -- ICEES API type
biolink_types: List BiolinkType -- biolink types
}
let BiolinkType = Text
let ICEESAPIType = <String: TypeString | Integer : TypeInteger | Number >
let TypeInteger = {
minimum : Optional Integer,
maximum : Optional Integer
}
let TypeString = {
enum : Optional (List Text)
}
For each icees feature, we define a feature type and a biolink type. The feature type can be String
, Integer
, or Number
. For String
, we can specify and optional list of values. For Integer
, we can specify an optional maximum and an optional minimum.
Identifers are defined as follows:
let Identifiers = List Text
The identifiers type is a list of identifiers.
let BinningStrategy = {
method: BinningMethod, -- binning method
suffix: Optional Text -- optional suffix for feature variable referencing this binning strategy
}
let BinningMethod = < Cut : Integer | QCut : Integer | Bins : Bins | NoBinning >
let Bins = < RangeBins : RangeBins | String : List (List String) >
let RangeBins = { -- see pandas.cut()
bins: List Double, -- bin edges
right: Bool, -- include the rightmost edges
include_lowest: Bool, -- whether the first bin should be left-inclusive or not
labels: List Text -- bin labels. should be of the same number as the bins
}
let BinString = List Text
We support three binning strategies: equal-widthed bins, quantiles, and customized bins. For customized bins, we support bin as range or discrete bin.
The files can be stored in nested directories. They can be stored according to their biolink types as Patrick suggested. On the top level, we can have a package.dhall
. For example:
config
|-package.dhall
|-Drug
| \-Prednisone.dhall
\-Disease
|-AsthmaDx.dhall
|-ReactiveAirwayDx.dhall
|-DILIDx.dhall
\-CroupDx.dhall
The package.dhall
contains all feature variable references so that we can simply include one file when we assemble configurations files.
{
Albuterol = ./Drug/Albuterol.dhall,
Prednisone = ./Drug/Albuterol.dhall,
AsthmaDx = ./Disease/Asthma.dhall,
ReactiveAirwayDx = ./Disease/Albuterol.dhall,
CroupDx = ./Disease/CroupDx.dhall,
DILIDx = ./Disease/DILIDx.dhall
}
A combine.dhall
will be written with scripts for combining a selected list of ICEES feature variables into a configuration file. We will define three function pit_mapping
for generating the mapping file for FHIR PIT, icees_features
for generating icees features file for the ICEES API, and trapi_identifiers
for generating identifiers file.
For each configuration file, a Dhall file will be written. These file are the same for every variable set.
For mapping, create a fhir_pit_mapping.dhall
:
let combine = ./combine.dhall
let variables = ./variables.dhall
in combine pit_mapping variables
For identifiers, create a trapi_identifiers.dhall
:
let combine = ./combine.dhall
let variables = ./variables.dhall
in combine.trapi_identifiers variables
For ICEES features, create a icees_api_features.dhall
:
let combine = ./combine.dhall
let variables = ./variables.dhall
in combine.icees_features variables
When deploy a new instance, we can define a selection of feature varibles as follows:
- Create a
variables.dhall
file:
let v = ./config/package.dhall
in [
v.Albuterol,
v.ReactiveAirwayDx,
v.CroupDx,
v.AsthmaDx
]
- Run
dhall-to-yaml
:
dhall-to-yaml --file pit_mapping.dhall --output pit_mappings.yml
To add a new ICEES Feature Variable, for example AvgDailyPM2.5Exposure
- add
AvgDailyPM2.5Exposure.dhall
under the appropriate direction, for example,./config/ChemicalSubstance/AvgDailyPM2.5Exposure.dhall
.
{
name = "AvgDailyPM2.5Exposure",
mapping = Mapping.EnvironmentalMapping {
dataset = "cmaq",
column = "pm2.5",
statitics = [
{
statistic = Avg,
rename = Suffix "_StudyAvg"
},
{
statistic = Max,
rename = Suffix "_StudyMax"
},
{
statistic = PrevDate,
rename = Replace "Avg24hPM2.5Exposure"
}
]
},
binningStrategies = [
{
strategy = Cut 5
},
{
strategy = QCut 5,
suffix = "_qcut"
}
],
feature = {
feature_type = ICEESAPIType.Integer {
minimum = 1,
maximum = 5
},
biolink_types = [
"biolink:ChemicalSubstance"
]
},
identifiers = [
"MESH:D052638"
]
}
- Add a field in
./config/package.json
{
...
AvgDailyPM25Exposure = ./ChemicalSubstance/AvgDailyPM2.5Exposure.dhall,
...
}