diff --git a/paper-data-platforms.pdf b/paper-data-platforms.pdf index d82a431..c30a669 100644 Binary files a/paper-data-platforms.pdf and b/paper-data-platforms.pdf differ diff --git a/paper-data-platforms.qmd b/paper-data-platforms.qmd index 389db24..00844cf 100644 --- a/paper-data-platforms.qmd +++ b/paper-data-platforms.qmd @@ -130,13 +130,13 @@ Besides the two case studies, we will discuss how the design relates to other re ### Open standards: using FHIR as the common data model -The recent convergence to FHIR as the de facto standard for information exchange has fuelled the development of OpenHIE. FHIR is currently used both for routine healthcare settings40 and clinical research settings [@duda2022hl7;@vorisek2022fast] and is increasingly being used in LMICs as well. The FHIR-native OpenSRP platform [@mehl2020open] has been deployed in 14 countries targeting various patient populations, amongst which a reference implementation of the WHO antenatal and neonatal care guidelines for midwives in Lombok, Indonesia [@summitinstitutefordevelopment2023bunda;@kurniawan2019midwife]. In India, FHIR is used as the underlying technology for the open Health Claims Exchange protocol specification, which has been adopted by the Indian government as the standard for e-claims handling [@hcx]. Additionally, the guidelines and standards of the African Union explicitly state FHIR is to be used as the messaging standard [@2023african]. This range of utilizations showcase the standards’ widespread applicability. The proceedings of the OpenHIE conference 2023 attest to the fact that FHIR and open source technologies are embraced as critical enablers in implementing health information exchanges in LMICs [@ohie23]. +The recent convergence to FHIR as the de facto standard for information exchange has fuelled the development of OpenHIE. FHIR is currently used both for routine healthcare settings[ @ayaz2021fast] and clinical research settings [@duda2022hl7;@vorisek2022fast] and is increasingly being used in LMICs as well. The FHIR-native OpenSRP platform [@mehl2020open] has been deployed in 14 countries targeting various patient populations, amongst which a reference implementation of the WHO antenatal and neonatal care guidelines for midwives in Lombok, Indonesia [@summitinstitutefordevelopment2023bunda;@kurniawan2019midwife]. In India, FHIR is used as the underlying technology for the open Health Claims Exchange protocol specification, which has been adopted by the Indian government as the standard for e-claims handling [@hcx]. Additionally, the guidelines and standards of the African Union explicitly state FHIR is to be used as the messaging standard [@2023african]. This range of utilizations showcase the standards’ widespread applicability. The proceedings of the OpenHIE conference 2023 attest to the fact that FHIR and open source technologies are embraced as critical enablers in implementing health information exchanges in LMICs [@ohie23]. However, despite the increased use of FHIR as a common data model, various studies have investigated its merits and performance vis-a-vis other healthcare standards. Comparisons between OpenEHR, ISO 13606, OMOP and FHIR have been made [@ayaz2023transforming;@mullie2023coda;@rinaldi2021openehr;@cremonesi2023need;@sinaci2023data]. A study involving 10 experts comparing OpenEHR, ISO 13606 and FHIR concluded that i) these three standards are functionally and technically compatible, and therefore can be used side by side; and that ii) each of these standards have their strengths and limitations that correlate with their intended use as summarized in the @tbl-comparison. ![Comparison of OpenEHR, ISO 13606 and FHIR standards](images/comparison-ehr-standards.png){width=auto #tbl-comparison} -For an infectious diseases dataset with a limited scope, OpenEHR, OMOP and FHIR have been compared and found all to be equally suitable [@rinaldi2021openehr]. Comparing OMOP and FHIR, the latter has been found to support more granular mappings required for analytics and was therefore chosen as the standard for the CODA project[@mullie2023coda]. +For an infectious diseases dataset with a limited scope, OpenEHR, OMOP and FHIR have been compared and found all to be equally suitable [@rinaldi2021openehr]. Comparing OMOP and FHIR, the latter has been found to support more granular mappings required for analytics and was therefore chosen as the standard for the CODA project [@mullie2023coda]. Although FHIR was originally designed only for exchange between systems, we propose to use it as the common data model for the design presented here for the following reasons: @@ -149,9 +149,9 @@ Although FHIR was originally designed only for exchange between systems, we prop Data management and analytics platforms have undergone significant changes since the first generation of data warehouses were introduced. Recent studies have shown that the current practice has converged towards the lakehouse as one of the most commonly used solution designs [@armbrust2021lakehouse;@hai2023data;@harby2022data]. Lakehouses typically have a zonal architecture [@hai2023data] where data is ingested from the source systems in bulk (E), delivered to storage with aligned schemas (L) and transformed into a format ready for analysis (T). The discerning characteristic of the lakehouse architecture is its foundation on low-cost and directly-accessible storage that also provides traditional analytical DBMS management and performance features such as ACID transactions, data versioning, auditing, indexing, caching, and query optimization [@armbrust2021lakehouse]. Lakehouses thus combine the key benefits of data lakes and data warehouses: low-cost storage in an open format accessible by a variety of systems from the former, and powerful management and optimization features from the latter. -With respect to current implementations of lakehouse data platform, we observe a proliferation of tools with as yet limited standards to improve technical interoperability. In the analysis of Pedreira et al. [@pedreira2023composable] the requirement for specialization in data management systems has evolved faster than our software development practices. This situation has created a siloed landscape composed of hundreds of products developed and maintained as monoliths, with limited reuse between systems. It has also affected the end users, who are often required to learn the idiosyncrasies of dozens of incompatible SQL and non-SQL API dialects, and settle for systems with incomplete functionality and inconsistent semantics. To remedy this, Pedreira et al. call to (re-)design and implement modern data platforms in terms of a 'composable data stack’ as a means to decrease development and maintenance cost and pick-up the speed of innovation. +With respect to current implementations of lakehouse data platforms, we observe a proliferation of tools with as yet limited standards to improve technical interoperability. In the analysis of Pedreira et al. [@pedreira2023composable] the requirement for specialization in data management systems has evolved faster than our software development practices. This situation has created a siloed landscape composed of hundreds of products developed and maintained as monoliths, with limited reuse between systems. It has also affected the end users, who are often required to learn the idiosyncrasies of dozens of incompatible SQL and non-SQL API dialects, and settle for systems with incomplete functionality and inconsistent semantics. To remedy this, Pedreira et al. call to (re-)design and implement modern data platforms in terms of a 'composable data stack’ as a means to decrease development and maintenance cost and pick-up the speed of innovation. -While the lakehouse architecture separating the concerns of compute and storage, the composable data stack takes the separation of concerns is taken one step further. A composable data system (@fig-composable-data-stack), not only separates the storage (layer 3) and execution (layer 2), but also separates the user interface (layer 1) from the execution engine by introducing standards for Intermediate Representation (standard A) and Connectivity (standard B). The composable data stack can be implemented with current open source technologies (@fig-cds-examples). As an example, the Ibis user interface is currently sufficiently mature to offer a standardized dataframe interface to 19 different execution engines. +While the lakehouse architecture separates the concerns of compute and storage, the composable data stack takes the separation of concerns is taken one step further. A composable data system (@fig-composable-data-stack), not only separates the storage (layer 3) and execution (layer 2), but also separates the user interface (layer 1) from the execution engine by introducing standards for Intermediate Representation (standard A) and Connectivity (standard B). The composable data stack can be implemented with current open source technologies (@fig-cds-examples). As an example, the Ibis user interface is currently sufficiently mature to offer a standardized dataframe interface to 19 different execution engines. ![Composable data stack](images/composable-data-stack.png){#fig-composable-data-stack} @@ -170,7 +170,7 @@ Following Hai we take a subset of the core functionalities of a data lake. Becau ### Open technologies: available digital public goods -Many components of the OpenHIE specification are now available as a digital public good. @tbl-digital-public-goods lists components that are currently available for implementing the OpenHIE framework using open source, digital public goods that are compliant with the FHIR standard, illustrating the maturity of this ecosystem and development community. With the launch of the Instant OpenHIE configuration toolkit54, it has become easier to set up, explore and develop HIEs thereby reducing costs and skills required for software developers to deploy an OpenHIE architecture for quicker solution testing and as a starting point for faster production implementation and customisation. Several frameworks are available that offer a set of preconfigured components out of the box, such as for example: +Many components of the OpenHIE specification are now available as a digital public goods. @tbl-digital-public-goods lists components that are currently available for implementing the OpenHIE framework using open source, digital public goods that are compliant with the FHIR standard, illustrating the maturity of this ecosystem and development community. With the launch of the Instant OpenHIE configuration toolkit[@InstantOpenHIEv2], it has become easier to set up, explore and develop HIEs thereby reducing costs and skills required for software developers to deploy an OpenHIE architecture for quicker solution testing and as a starting point for faster production implementation and customisation. Several frameworks are available that offer a set of preconfigured components out of the box, such as for example: - the “Open Smart Register Platform” (OpenSRP), that focuses on providing a mobile-first platform, including a FHIR native app designed to support the WHO Smart Guidelines - the OpenHIM Platform, a reference implementation of the Instant OpenHIE framework, providing an easy way to set up, manage and operate various HIE configurations diff --git a/paper-data-platforms.tex b/paper-data-platforms.tex index 023b8dd..314384f 100644 --- a/paper-data-platforms.tex +++ b/paper-data-platforms.tex @@ -474,12 +474,13 @@ \subsection{Open standards: using FHIR as the common data The recent convergence to FHIR as the de facto standard for information exchange has fuelled the development of OpenHIE. FHIR is currently used -both for routine healthcare settings40 and clinical research settings -\citep{duda2022hl7, vorisek2022fast} and is increasingly being used in -LMICs as well. The FHIR-native OpenSRP platform \citep{mehl2020open} has -been deployed in 14 countries targeting various patient populations, -amongst which a reference implementation of the WHO antenatal and -neonatal care guidelines for midwives in Lombok, Indonesia +both for routine healthcare settings\citep{ayaz2021fast} and clinical +research settings \citep{duda2022hl7, vorisek2022fast} and is +increasingly being used in LMICs as well. The FHIR-native OpenSRP +platform \citep{mehl2020open} has been deployed in 14 countries +targeting various patient populations, amongst which a reference +implementation of the WHO antenatal and neonatal care guidelines for +midwives in Lombok, Indonesia \citep{summitinstitutefordevelopment2023bunda, kurniawan2019midwife}. In India, FHIR is used as the underlying technology for the open Health Claims Exchange protocol specification, which has been adopted by the @@ -520,8 +521,8 @@ \subsection{Open standards: using FHIR as the common data and FHIR have been compared and found all to be equally suitable \citep{rinaldi2021openehr}. Comparing OMOP and FHIR, the latter has been found to support more granular mappings required for analytics and was -therefore chosen as the standard for the CODA -project\citep{mullie2023coda}. +therefore chosen as the standard for the CODA project +\citep{mullie2023coda}. Although FHIR was originally designed only for exchange between systems, we propose to use it as the common data model for the design presented @@ -577,7 +578,7 @@ \subsection{Open architecture: extending OpenHIE framework with a former, and powerful management and optimization features from the latter. -With respect to current implementations of lakehouse data platform, we +With respect to current implementations of lakehouse data platforms, we observe a proliferation of tools with as yet limited standards to improve technical interoperability. In the analysis of Pedreira et al. \citep{pedreira2023composable} the requirement for specialization in @@ -592,7 +593,7 @@ \subsection{Open architecture: extending OpenHIE framework with a a `composable data stack' as a means to decrease development and maintenance cost and pick-up the speed of innovation. -While the lakehouse architecture separating the concerns of compute and +While the lakehouse architecture separates the concerns of compute and storage, the composable data stack takes the separation of concerns is taken one step further. A composable data system (Figure~\ref{fig-composable-data-stack}), not only separates the storage @@ -658,17 +659,18 @@ \subsection{Open technologies: available digital public goods}\label{open-technologies-available-digital-public-goods} Many components of the OpenHIE specification are now available as a -digital public good. Table~\ref{tbl-digital-public-goods} lists +digital public goods. Table~\ref{tbl-digital-public-goods} lists components that are currently available for implementing the OpenHIE framework using open source, digital public goods that are compliant with the FHIR standard, illustrating the maturity of this ecosystem and development community. With the launch of the Instant OpenHIE -configuration toolkit54, it has become easier to set up, explore and -develop HIEs thereby reducing costs and skills required for software -developers to deploy an OpenHIE architecture for quicker solution -testing and as a starting point for faster production implementation and -customisation. Several frameworks are available that offer a set of -preconfigured components out of the box, such as for example: +configuration toolkit\citep{InstantOpenHIEv2}, it has become easier to +set up, explore and develop HIEs thereby reducing costs and skills +required for software developers to deploy an OpenHIE architecture for +quicker solution testing and as a starting point for faster production +implementation and customisation. Several frameworks are available that +offer a set of preconfigured components out of the box, such as for +example: \begin{itemize} \tightlist