-
Notifications
You must be signed in to change notification settings - Fork 37
Home
Superglue Architecture
Superglue is a collection of components which all contribute functionality to solving our core problems. In describing our architecture, we take special care to point out which components are clients and which are libraries, in order to clarify entry-points and extension-points of the system.
![Architecture](.github/assets/Screen Shot 2020-10-01 at 9.55.28 AM.png)
Library components
Library components are not themselves executable; instead, they expose a programmatic interface which can be consumed by clients in order to invoke their functionality. Below, we’ll give a high-level overview of problem each library component solves. Data-Access-Objects (DAO)
The DAO component defines programmatic access to Superglue’s data model, and abstracts over the persistence mechanism used in a given deployment. This library is used by any client that needs to read or edit the state of the system. Parser
The parser is used to analyze SQL scripts and identify usages of table names, as well as those tables are used as inputs (e.g. SELECT) or outputs (e.g. CREATE TABLE). This metadata is later used to construct a graph of lineage of data moving through those tables.
Input: SQL scripts
Output: Mapping of scripts to their input and output tables
Service
The service component provides an assortment of functionality which is used for fulfilling client requests.
Lineage
The LineageService class serves requests for lineage by stitching together a graph from the metadata collected by the parser.
Elasticsearch
The ElasticService class provides an elasticsearch client and useful methods for manipulating Superglue data on ES, including creating indices, uploading documents, and managing aliases. Client components
Client components are executables which consume the exposed interfaces of the library components to actually invoke the functionality provided. Command-line interface
The command line component is arguably the simplest way to interact with Superglue. It provides a handful of flags and parameters that can be used to configure what actions to take. A good example use-case is using the CLI to run the parser over a collection of scripts. REST API
The REST API provides an HTTP interface to Superglue’s functionality. Key among these is the ability for web clients to request the lineage of a given table or job, as well as to request execution statistics for active jobs. This API is the interface that supports the Superglue UI, a web application that visually presents the lineage and execution data.