NSA is using GraphQL to define a common data model (CDM) and event/message formats. The event or message formats are defined as GraphQL queries from the CDM.
Using GraphQL allows a contract developer to describe both the data model and message format at the same time, rather than needing two sets of semantics. This is useful when an attribute may be optional on the underlying data model, but required when that model is used in a specific message.
The primary purpose of NSA is to generate code and schemas in multiple languages, all based on the root definition using GraphQL. The outputs can be other schema languages, such as protobuf or JSON Schema, or code, with Go, Ruby, and Python currently supported.
The benefit of a common data model comes from the ability to easily disseminate its implementation across multiple teams and services. A build pipeline will watch for schema changes on a feature branch, then launch a secondary pipeline to generate the output for all target languages. That output is then committed back to the feature branch, where a developer can review the changes before merging to the main branch. All relevant, language-specific output packages are rebuilt, versioned and tagged.
GraphQL Syntax Used for a Novel Approach to Schema Validation and Code Generation
Schema languages are not new, and come by the dozens: SQL, XML Schema, json-schema, Protobuf, GraphQL...
What's key to understand is that a schema language is (in general) perfectly capable of expressing a data model or a message format but not both at the same time, and that's the catch.
A message format, in relation to a data model, must be expressed as a projection of the data model. As such you need some additional semantics to express that projection.
For instance a given data entity may have 5 required fields to be considered valid. However, when a message carries that entity, it can convey an arbitrary number of attributes. Conversely, an attribute may be optional in a data schema, but in a particular context, a message schema may require that attribute. There is just no way around it, you need two distinct sets of semantics to express a data model schema and the message formats that carry entities of that data model.
In general, message schemas (e.g. json-schema) should be self-standing (without imports) as they specify type projections which vary from message to message. It may also cumbersome and brittle to include a library of files to validate a single payload.
GraphQL provides a unique set of semantics and tools that make it easy to create a Schema Architecture:
- a Schema Language
- a Query language which has "projection semantics"
- custom directives to annotate the Schema or Query language
- parsers for both syntaxes
- modern editors with syntax coloring, validation...
- Documentation generators
In essence, all that's left to do is to implement schema generators that will take message format definitions (as GraphQL Queries) and generate other schema languages such as Protobuf, json-schema, XML Schema, ... and language specific client libraries. Just to be clear, there is absolutely no graphql engine running in NSA. We only use graphql as a syntax to capture the data model and message formats. One of the big benefits that we get with that approach is that most developers (backend and frontend) are familiar with graphql.
Metadata
+----------------------------------------------+
| +----------+ +----------+ +----------+ | +-------------+ +----------+ +----------+ +----------+ +----------+
| | | | | | | | | | | Schema 1 | | Schema 1 | | msg_1.go | | msg_1.rb |
| | Query 1 | | Query 2 | ... | Query N | | | | | | | | | | | |
| | | | | | | | | | +----------+ +----------+ +----------+ +----------+
| +-----+----+ +----+-----+ +-----+----+ | | | +----------+ +----------+ +----------+ +----------+
| | | | | | Generator | | Schema 2 | | Schema 2 | | msg_2.go | | msg_2.rb |
| +--------/--+------------------+ | ----\ | | ----\ | | | | | | | |
| / | ----/ | | ----/ +----------+ +----------+ +----------+ +----------+
| +------------/------------+ | | | ... ... ... ...
| | | | | | +----------+ +----------+ +----------+ +----------+
| | Integration Schema | | | | | Schema N | | Schema N | | msg_3.go | | msg_3.rb |
| | | | | | | | | | | | | |
| +-------------------------+ | +-------------+ +----------+ +----------+ +----------+ +----------+
| | | json-schema protobuf *.go *.ruby
+-------------|--------------------------------+
|
|
|
| +--------------------+ +--------------------+
| | | | |
| | Voyager or | | Playground |
| | GraphQLDoc | | (query editor) |
| | | | |
| +--------------------+ +--------------------+
| | |
| | |
| +--------------------+ |
| | S3 Web Server | |
+-------------------->| or | |
| Apollo-Server |<-------------------------- +
+--------------------+
The code generator is meant to be run in the gitlab built pipeline and in any case, you should never commit code to the output/
directory. However if you want to run the code generator locally, install nodejs and npm.
The exception is the package specifiction files: go.mod
, nav-schema-architecture.gemspec
, and setup.py
. The version numbers in these files are managed by the pipelines and should not be changed, but other entries such as the project description may be changed manually.
npm install
npm run generate-output:generate
The generate script uses rubocop (Ruby), unify, autoflake (Python) and goimports (Golang)
gem install rubocop
pip install unify
pip install --upgrade autoflake
go get golang.org/x/tools/cmd/goimports
The project is modular. The list of current generators (json-schema, golang, protobuf, ruby and python) can be extended easily while some of the generators may be turned off
It assumes the schema is in ./schema/
and the message definitions are in ./schema/message-schema-definitions
You can also specify a set of environment variables to point to a different set of directories
Env variable | Example | Description |
---|---|---|
SCHEMA | schema/integration-schema.gql | the location of the GraphQL schema (common data model) |
MESSAGE_DEFINITION_DIR | schema/message-schema-definitions | the parent directory where the message definitions are located |
OUTPUT_DIR_JSON_SCHEMA | output/json-schema | json-schema output directory |
OUTPUT_DIR_GO_STRUCT | output/go | Go struct output directory |
OUTPUT_DIR_RUBY | output/ruby | Ruby output directory |
GIT_ROOT | git.nav.com/engineering | The root of your git repo |
The schema architecture is using the GraphQL schema notation to model the Common Data Model (CDM) data types.
Scalars represent leaves data types
. Currently the generator supports simple type translations and regex validations but this could be extended in the future to more sophisticated schemes. Scalars can also be used to map to native data types, while retaining the option to add more validation later. Scalar definitions are specified in the schema/scalars
directory (one file per code generator instance)
Currently a scalar definion contains a type and optionally a regex pattern or an import statement:
const ZIPCode = {
type: 'string',
pattern: '^[0-9]{5}(?:-[0-9]{4})?$',
}
const DateTime = {
type: 'google.protobuf.Timestamp',
import: 'google/protobuf/timestamp.proto',
}
module.exports = {
DateTime,
CurrencyCent,
UUID,
Phone,
ZIPCode, // <-- all scalar type definition needs to be exported
Any, // <-- reserved scalar type
}
The Any
type is a reserved scalar type that is implemented without validation and should accept any (JSON) type. In Golang the Any
type is implemented as a json.RawMessage
.
The CDM types are defined as expected using GraphQL Types and type extensions. The CDM will be broken down in the future to enable teams for focus on their domain. That's for instance when type extensions will come handy as it will enable a team to add properties on a type without interfering the the work of others.
type Person {
address: [Address] # Array
firstName: String! # required
lastName: String! # required
phone: Phone # optional
}
extends type Person {
height: Float
}
Enums are supported as expected and the corresponding property values are validated accordingly
enum ProfileSource {
USERREPORTED
CREDITREPORT
CASHFLOW
CLOVER
}
Currently Message definitions associated to a single types (for instance accountCreated
event -> Account
type). It is preferable to use entity types when possible, but sometimes complex events contain a series of entity information. In that case, it is perfectly ok to create a composite type (for instance AccountAndSubscription
).
The first step to specify a message format is to declare it in the Query type of the schema (since we are using the GraphQL Query DSL to specify it):
# message definitions
type Query {
myTweets: User
stats: Tweet
profile: User
# ....
}
The next step is to create the message specification in the corresponding directory:
schema/
|
+-- message-schema-definitions/
|
+-- business-profile/
|
| business-revenue-changed-v1.graphql
The corresponding message definition is a valid graphql query on the type declared about (though no validation is currently done on the query):
{
profile
@namespace(value: "api.tweet.profile")
@title(value: "User profile payload")
@description(value: "Sample user profile")
@version(value: 1) {
id @field(order: 1, required: true)
username @field(order: 2, required: true)
firstName @field(order: 3)
lastName @field(order: 4)
bio @field(order: 5)
birthdate @field(order: 6)
email @field(order: 7, required: true)
accountType @field(order: 8)
verified @field(order: 9)
}
}
The directives that currently used are:
@namespace
that map to the json-schema namespace, the golang package and the ruby module associated to generated code/schema.
@title
and @description
are just comments added to the generated code.
@version
should be used to refer to the major version of the schema (SemVer). Message definitions should be designed with compatibility in mind for the minor versions.
The following values of the @field
directive have been implemented:
order
maps to the field order value in the protobuf code generatorrequired
overrides the CDM schema field required attribute. When a field is required in the schema, it can be made optional in the payload definition (required: false
). Conversely, when a field is required in the schema is can be made optional in the payload definition (required: true
). When no changes are desired, then the field required attribute is not needed.
Message payload definitions may also be implemented with a node module that must return a parsed gql query.
- If you are reusing the same type in a payload, you have to use the same attributes. For instance if Address is reused twice (business address, person address), it should use the same attributes (i.e. the same projection).
NSA's implementation is relatively simple. The index.js shows three simple steps:
- parse (schema and message payload definitions)
- generate code
- save output files
The code generators are modular, all using a core generator
The core generator is in charge of computing the projections between the graphql queries and the graphql schema while orchestrating the code generation (enums, structs, root,...).
The graphql parser produces a convoluted AST for both the schema and the message payload definition. The parse.js library is a set of helper functions that transformed the graphql AST into code gen ready AST. The last library, util.js is made up of a series of general helper functions.
The target code generators are all structured the same way with a series of call backs for the core generator
:
enumDef
structDef
generate
(generates the root element of the message payload definition)
The code generators are all using their corresponding scalar file.
Finally, the ./src/generators
directory contains all the target code generators (go, python, ruby, json-schema and protobuf).
The target code generators are activated via the ./src/generators/index.js
file which exports an array of configurations:
{
generate: rubyGenerate,
extension: 'rb',
toDir: env('OUTPUT_DIR_RUBY', 'output/ruby/nsa'),
staticDir: env('STATIC_DIR_RUBY', 'static/ruby/nsa'),
type: 'ruby',
outputFormatter: R.identity,
postProcessStep: rubyPostProcessStep,
}
The output formatter is a simple function which can be used to make final touches to the generated code (for instance for go-lang):
const stringFormatter = (p, context) => {
switch (p) {
case 'iD':
return 'ID'
}
return p
}
The post processing step can be used to generate ancillary files such as the require file in Ruby.
The staticDir contains static files that need to be added to the generated directory structure.