The platform SDK starter is a way to configure your Flatfile Workbook in code.
Visit the Guides to learn more about using Flatfile and the Platform SDK Starter.
Visit CLI quickstart and follow instructions to sign up for a Flatfile account and run this code locally.
Here is an example of a fully functional Flatfile Workbook that we'll use in this example to demonstrate importing employee data.
const Employees = new Sheet(
'Employees',
{
firstName: TextField({
label: 'First Name',
required: true,
description: 'Given Name'
}),
lastName: TextField(),
fullName: TextField(),
stillEmployed: BooleanField(),
department: OptionField({
label: 'Department',
options: {
engineering: { label: 'Engineering' },
hr: 'People Ops',
sales: 'Revenue',
},
matchStrategy: 'exact',
}),
fromHttp: TextField({ label: 'Set by batchRecordCompute' }),
salary: NumberField({
label: 'Salary',
description: 'Annual Salary in USD',
required: true,
validate: (salary: number) => {
const minSalary = 30_000
if (salary < minSalary) {
return [
new Message(
`${salary} is less than minimum wage ${minSalary}`,
'warn',
'validate'
),
]
}
},
}),
},
{
allowCustomFields: true,
recordCompute: (record) => {
const fullName = `{record.get('firstName')} {record.get('lastName')}`
record.set('fullName', fullName)
return record
},
batchRecordsCompute: async (payload: FlatfileRecords<any>) => {
const response = await fetch('https://api.us.flatfile.io/health', {
method: 'GET',
headers: {
Accept: 'application/json',
},
})
const result = await response.json()
payload.records.map(async (record: FlatfileRecord) => {
record.set('fromHttp', result.info.postgres.status)
})
},
}
)
const BaseSheet = new Sheet(
'BaseSheet',
{
firstName: TextField(),
middleName: TextField('Middle'),
lastName: TextField(),
email: TextField({
unique: true,
primary: true
})
},
{
previewFieldKey: 'email',
}
)
const LinkedSheet = new Sheet(
'LinkedSheet',
{
email: LinkedField({
unique: true,
label: 'Email',
primary: true,
sheet: BaseSheet,
upsert: false,
}),
firstName: TextField(),
middleName: TextField('Middle'),
lastName: TextField(),
},
)
...
The above code will generate a Workbook that looks like this:
Follow these steps to deploy a Workbook and view it in your Dashboard. Because your secret keys are being used, Flatfile will know to create your first Workspace in the correct place.
- From the root directory of this project run
npm install
to install all necessary packages. - Run
npm run deploy
- To see the newly deployed Workbook, open your Dashboard and have a look.
Now, let's take a closer look at the example Workbook we just deployed, starting with the Fields. This Workbook uses all five built-in Flatfile fields to represent a Workbook used to receive employee data:
TextField
: a string of textNumberField
: a numerical valueDateField
: a dateOptionField
: a field with a set of pre-defined valuesBooleanField
: a true / false fieldLinkedField
: a field that links two sheets together
Let's take a closer look at some of the options we've set on these fields:
firstName: TextField({
required: true,
description: 'Given name',
})
Here we've indicated that the firstName
field is required, and we've given it a human-readable description.
salary: NumberField({
label: 'Salary',
description: 'Annual Salary in USD',
required: true,
validate: (salary: number) => {
const minSalary = 30_000
if (salary < minSalary) {
return [
new Message(
`${salary} is less than minimum wage ${minSalary}`,
'warn',
'validate'
),
]
}
},
}),
Here we've indicated that the salary
field is required, and we've given it a human-readable description.
We also provide a validate
function that defines what we consider to be a valid value for this field. In this case, we've decided that salary
must be greater than or equal to $30,000. We also provide a human-readable message to be displayed when the validation criterion is not met.
department: OptionField({
label: 'Department',
options: {
engineering: { label: 'Engineering' },
hr: 'People Ops',
sales: 'Revenue',
},
matchStrategy: 'exact',
}),
Here we provide a pre-defined list of values that this field can have. We have also included the matchStrategy
flag, which determines whether Flatfile should only accept exact matches when automatically matching your OptionField options, or whether Flatfile should also use historical matches and fuzzy matches while automatically matching your options. If this is not set, your OptionField will use historical and fuzzy matches to automatically match options for this field.
If a default
value is provided for an OptionField
we check that is a value provided in options
.
If you want to have an option value of the empty string ''
, you must set that as a default like this.
emptyExample : OptionField({
options:{eng:"Eng_Display_Label4", hr:"HR", "":"empty"},
default: ''
})
email: LinkedField({
label: 'First Name',
sheet: BaseSheet,
upsert: false
}),
Here we define which field is linked to another template, along with the sheet this field should be linked to. We have also set upsert: false
which disables the default upsert behavior and will display an error on this sheet if a value imported in the LinkedField does not exist on the parent sheet. For more information about our Relational Data feature, visit our Relational Data Guide.
const BaseSheet = new Sheet(
'BaseSheet',
{
firstName: TextField({
unique: true,
primary: true,
}),
middleName: TextField('Middle'),
lastName: TextField(),
email: TextField(),
},
{
previewFieldKey: 'email',
}
)
Here we define the sheet we are linking to, and on the sheet set the previewFieldKey
option that will display on the original template. Note: LinkedField
can currently only be implemented in Workspaces, which is why this example doesn't have a portal deployed for the LinkedSheet and BaseSheet sheets.
Next, let's look at the options that we've set on the sheet itself:
const Employees = new Sheet(
'Employees',
...
{
allowCustomFields: true,
recordCompute: (record) => {
const fullName = `{record.get('firstName')} {record.get('lastName')}`
record.set('fullName', fullName)
return record
},
batchRecordsCompute: async (payload: FlatfileRecords<any>) => {
const response = await fetch('https://api.us.flatfile.io/health', {
method: 'GET',
headers: {
Accept: 'application/json',
},
})
const result = await response.json()
payload.records.map(async (record: FlatfileRecord) => {
record.set('fromHttp', result.info.postgres.status)
})
},
}
)
...
Here we define our Data Hooks®, functions which operate on user-inputted data and perform transformations on it to meet your system's needs. There are two types of Data Hooks® in this example: a recordCompute
hook and a batchRecordsCompute
hook.
recordCompute: (record) => {
const fullName = `{record.get('firstName')} {record.get('lastName')}`
record.set('fullName', fullName)
return record
}
recordCompute
is a Data Hook which runs synchronously for each record. In this example, we use the values of the firstName
and lastName
fields to generate and set the derived fullName
field.
batchRecordsCompute: async (payload: FlatfileRecords<any>) => {
const response = await fetch('https://api.us.flatfile.io/health', {
method: 'GET',
headers: {
Accept: 'application/json',
},
})
const result = await response.json()
payload.records.map(async (record: FlatfileRecord) => {
record.set('fromHttp', result.info.postgres.status)
})
}
batchRecordsCompute
is a Data Hook which runs after all recordCompute
calls have finished and receives the full list of processed rows, at which point you can perform bulk operations on them. This is a good place to put slower operations that benefit from batching, such as external HTTP requests. In this example, we fetch a single value from an API and write it to each record.
validate
should be used in most cases where you want to confirm that user-inputted data matches your specifications. For most fields, you probably want to use validate
. This function gets the proper type per field, and lets you add messages to the cell, including errors, warnings, and rejections, so the user can correct errors themselves. Validate can’t change data, it can only send messages. Compute must be used if you want to perform any type of transformation
recordCompute
and batchRecordsCompute
should only be used for cases where you must modify user-inputted data or generate new values not provided by the user but needed for your systems. For simple row work (that doesn't make HTTP calls) use recordCompute
. If you need to make an a call to an external API, reach for batchRecordsCompute
on sheet, as this allows you to request info about multiple values at once for increased performance.
We have written sensible default implementations of cast functions for TextField, NumberField, and DateField. We wrote extensive tests to document and verify their behavior. Refer to the CastFunction tests to see more.
When our default cast
function can't parse an incoming value in a reliable way, the cast function throws an error. The error message shows up in the UI and the original value is stored in the table so users can edit that value into a proper type.
field.compute
should be idempotent and converge to the same value after calling on the same input, calling the same compute function on the output from the last invocation should return the original input.
compute:(v:string) => {return v.toLocaleLowerCase()}
is a good function, since compute("ASDF") === compute('asdf') === 'asdf'
.
We are big believers in Test Driven Development at Flatfile. Well written tests help you reason about the behavior of complex systems.
We extensively used tests while developing this SDK, look here . We encourage you to use our testing system to speed up your development.
Running tests on a Sheet or Workbook is much faster than deploying to Flatfile and manually uploading data to verify behavior. Another added benefit is your tests will stay in this repo and help you make better decisions as you update your Sheets and Workbooks to reflect changing business requirements.
Stay tuned for future releases where we will add even more capabilities to our testing system.
The Flatfile Data Hook system has been designed to enable fine-grained functions to be used in combination to perform regular data validation and normalization tasks. Flatfile is building out a comprehensive standard library so that developers can plug in the proper functions without having to write them from scratch. This standard library is leveraged by HDDL to describe implementations concisely.
The Flatfile data pipeline orders data transformations so that functions at each point can be strictly typed with the most strictly prescribed functionality. This strict typing leads to more reliable functions that don't have surprise missing, undefined, or weird values.
- Matching takes place. At this point we have rows of fields with names mapped to sheet field names. Currently there is no ability to influence matching from the SDK
field:cast
: functions here take a string or undefined and return either the primitive type specified by the field, null, or throw an error.field:default
: ifcast
returned null (but didn't throw an error), a default value is filled in for the fieldfield:compute
: functions receive a single fully present value and return a value of the same typerecordCompute
: functions receive a row with all required fields fully present and optional fields typedoptional?:string
. Best used to compute derived values, but can also be used to update existing fields.batchRecordsCompute
: asynchronous (HTTP/api) calls are made to fill in values from external services. This takesrecords
so it is easier to make bulk calls.field:validate
: functions receive a fully present value and return annotations for the corresponding cell
If any of the above functions for a field throws an error, further processing is stopped for that field.
The most common custom written hooks that we expect to see are row compute and field validate.
We expect users to very rarely write their own cast functions; these are some of the easiest and most important to add to FFL.
recordCompute
is synchronous and only operates on one row at a time. In practice this isn't a big limitation because synchronous functions generally run extremely quickly in the node runtime.
batchRecordsCompute
runs after all recordCompute
have finished, and only operates on batches of records. We made this engineering decision to encourage bulk operations when making external HTTP calls which tend to be slow.
We are writing this SDK to enable skilled practitioners to quickly implement powerful transformations that take unstructured data from untrusted sources and shape that data into a clean, normalized format for input into many systems. This is the core of what Flatfile builds and we take it seriously. We also take our relationship with customers seriously, balancing putting tools in your hands quickly with supporting existing use cases. We are here to listen to your feedback and build tools for you.
This initial release of the SDK is purposely limited in scope to include only the pieces we are most sure about. We intend to rapidly release new functionality to the SDK and our platform as our understanding grows and we have time to put the best tools in front of you.
When releasing pieces to the SDK our thought process is guided by the following principles:
- Does this solve a problem in an extensible way? Will we paint ourselves into a corner to solve a current problem?
- Can we support this code for the next 6 months until a breaking release?
- Does this work as we expect it to?
Links to more in-depth documentation we have written about features.
- How can I lowercase an email field anytime input is provided by a file or manual entry?
- This is a good use for field
compute
. This function will be idempotent (running it over and over on the same input produces the same output and state)
- This is a good use for field
- Why can't I check for nulls in a
validate
function?- You can't check for
null
orundefined
in a validate function, because validate functions are never called with a value that isn't of the field's type. The way to check fornull
is to setrequired:true
on the field, this works for 95% of field use cases, this will flag an error at the review stage for fields provided with anull
value. Having this strict typing makesvalidate
functions less error prone and less repetitious. If we allowednull
orundefined
to propagate tovalidate
, every user provided function would have to start with checking fornull
orundefined
, users who didn't do this would either see a typing error, or worse suffer unreliable code that was deployed. - What do I do if I want to check for null or undefined in a validate function?
- The only place the built in
required
behavior doesn't work is when arecordCompute
orbatchRecordsCompute
function was expected to provide a value for a field, and failed. In that case, you are already writing arecordCompute
orbatchRecordsCompute
function, check for thenull
there.
- The only place the built in
- You can't check for
- How can check the type and size of an url and return an error if the linked file is > 5mb or not an image?
- Currently this is best accomplished with a text field named
s3_url
that will match to the URL provided in the upload, and a rowcompute
that stores the size of the download tos3_url_size
,s3_url_size
should have anvalidate
of less than 5mb. - In the near future this will be handled with a computed field that takes
s3_url
as an input and outputss3_url_size
. - In the longer term this will be handled by a
SmartURLCapture
field that stores URLs to a synced s3 bucket and provides metadata about the download including size. Thevalidate
will run on the size property of theSmartURLCapture
field
- Currently this is best accomplished with a text field named
- Using a domain name (www.flatfile.com) field, how can I call the clearbit API and populate another field with an image url to their logo?
- Currently this is best accomplished via a row
compute
. - Eventually we will build in Clearbit-integrated fields.
- Currently this is best accomplished via a row
- When adding an EmailField, if I specify “showSpamScore: true” I want a second, non-matcheable, readonly column to be added immediately after
email
called Spam Score. On input of any email I want to update the spam score field with the results from an API.- To get similar behavior use row
compute
to fake a computed field like the previous three examples. - The result is possible but this implementation is currently not possible with the SDK.
- Implementing this behavior requires composite fields or changes to the SchemaDDL so that a single Field call can emit to underlying fields
- To get similar behavior use row
- Can I round a number to two decimal places and simultaneously add a warning saying “this number was rounded to two decimal places: original number 90.090293
- This needs to be done via
recordCompute
orbatchRecordsCompute
. We currently only allow mutating data and simultaneously adding message in those hooks.
- This needs to be done via
- How can I generate an error if an email is an invalid format that says “Email is invalid format”
- This would be accomplished with
validate
.
- This would be accomplished with
- How can I normalize a phone number and use a nearby country field to “hint” which country the phone number may belong to get a more accurate result? The country field must be normalized to an ISO code first.
- At first glance this could work as a row
compute
to add Country code to phone numbers. - In a future version of the SDK, this probably requires hooks to influence the matching system.
- At first glance this could work as a row
- Can this be used to customize the schema based on the user that is logging in?
- No. That wouldn't result in the proper user experience because the schema is for every user.
- To customize behavior per user requires writing custom react editors for fields that specialize based on the logged-in user, then tying these editors or other custom functionality in with SchemaIL. We haven't written these custom fields, nor tied them in with SchemaIL.
- Can the SDK be used to generate dynamic schemas tied to my ORM or database definition?
- Currently this may be possible, but it isn't recommended.
- Eventually this will be possible by writing tools that translate from ORM or database DDL to schemaIL. We are currently solidifying the core functionality of the platform and this will remain out of scope for the foreseeable future.
A Sheet
object describes the desired characteristics or "shape" of data that you expect for an individual CSV file or sheet in an excel file. A Sheet can be thought of as roughly analogous to a database table.
A Field
object represents a column of data in a Sheet
. They are similar to columns in a database. Fields can be configured to clean, transform and validate incoming data through options and hooks.
Data Hooks® are the Flatfile copyrighted term for code that runs on Field
s and Sheet
s to transform and validate data.
Field hooks are Data Hook®s that run on individual fields.
A Workbook is a collection of Sheets. The Sheets of a Workbook can be optionally linked together via ForeignKey
. A Workbook is similar to a database schema.
Technically taken from SchemaIL interfaces
The base fields closely mirror primitive types and vary primarily by their default cast functions. Read the CastFunction tests for explicit understanding of cast function behavior.
Persists strings. Defaults to StringCast
cast
.
Persists numbers. Defaults to NumberCast
cast
Persists booleans. Defaults to BooleanCast
cast
Persists Dates. Defaults to DateCast
cast
Presents the user with discrete options. Accepts a specific option of
options:{'dbValue': 'Label displayed to user'}
options:{'dbValue': {label: 'Label displayed to user', futureOption1: undefined}
It is called like this
department: OptionField({
label: 'Department',
options: {
engineering: 'Engineering',
hr: 'People Ops',
sales: 'Revenue',
},
}),
or
department: OptionField({
label: 'Department',
options: {
engineering: {label:'Engineering'},
hr: {label:'People Ops'},
sales: {label:'Revenue'}
},
}),
interface FieldOptions {
label: string
primary: boolean
required: boolean
unique: boolean
cast: (value: Dirty<T>) => Nullable<T>
default: Nullable<T>
compute: (value: T) => T
validate: (value: T) => void | Message[]
}
(Field Options Definitions)[https://github.com/FlatFilers/platform-sdk-mono/blob/main/packages/configure/src/ddl/Field.ts#L43-L69] Every field can set at least the above properties
Controls the label displayed for the field in the UI
Is a value for this field required after the default
stage of field hooks. If set to true, an error will be thrown and registered as such on the cell for that value, no further processing will take place for that field.
Is this field the primary key for a sheet, or part of a composite primary key (not currently supported). Primary implies unique too.
Is this field required to be unique across the whole sheet. We have chosen to treat a field with multiple null
s as still unique. Tests and comments
cast
transforms input into the type specified by the field.
The default value for this field
compute
takes the type specified by the field and returns the type specified by the field.
validate
takes the type specified by the field and returns validation messages. This is the most commonly used field hook. Validate can’t change data, it can only send messages. Compute must be used if you want to perform any type of transformation
Long form description that appears in the UI upon hover of the field name.
Annotations are automatically filled in messages that the platform sdk provides when default
or compute
changes a value. Following independent options set as an object - {}
If set to true
insertions of the default
value will be annotated with an info message of defaultMessage
The message to use when a default
value is inserted. If none specified, defaults to 'This field was automatically given a default value of'
If set to true
instances where compute
changes the value of a field will be annotated with an info message of computeMessage
The message to use when a compute
changes a value. if none specified, defaults to 'This value was automatically reformatted - original data:'
controls what parts of mapping/review/export a field occurs in
When set to false
this field will not appear for matching in the mapping stage.
This field will not appear in the review stage
This field will not be exported.
export interface SheetOptions<FC> {
allowCustomFields: boolean
recordCompute: (record:FlatfileRecord<any>, logger?:any): void
batchRecordsCompute: (records: FlatfileRecords<any>) => Promise<void>
}
Allows the end user to create additional fields from their upload when the incoming column does not match with any existing field for the Sheet.
Function that receives a row with all required fields fully present and optional fields typed optional?:string
. Best used to compute derived values, can also be used to update existing fields.
Asynchronous function that is best for HTTP/API calls. External calls can be made to fill in values from external services. This takes records
so it is easier to make bulk calls.
This helper utility gives you tools to test real records and fields locally against your field hooks (default
, cast
, compute
and validate
) and record-level compute functions (recordCompute
and batchRecordsCompute
). This runs the exact same logic that is done on Flatfile's production servers so it will produce the same results locally for quick testing before deploying.
For example, with this TestSheet:
const TestSheet = new Sheet(
'TestSheet',
{
firstName: TextField({
required: true,
description: 'foo',
compute: (v) => v.toUpperCase(),
}),
age: NumberField(),
testBoolean: BooleanField({ default: false }),
},
{
recordCompute: (record, _session, _logger) => {
const age = record.get('age')
const newAge = typeof age === 'number' ? age * 2 : 0
record.set('age', newAge)
},
}
)
This takes a field name and value and returns the transformed value based on all sheet operations:
expect(await testSheet.transformField('age', '10')).toEqual(20)
This takes a full record and returns the transformed record based on all sheet operations:
const inputRow = { firstName: 'foo', age: '10', testBoolean: 'true' }
const expectedOutputRow = { age: 20, firstName: 'FOO', testBoolean: true }
const res = await testSheet.testRecord(inputRow)
expect(res).toMatchObject(expectedOutputRow)
This takes an array of full records and returns the transformed records based on all sheet operations:
const inputRows = [
{ firstName: 'foo', age: '10', testBoolean: 'true' },
{ firstName: 'bar', age: '8', testBoolean: 'true' },
]
const expectedOutputRows = [
{ age: 20, firstName: 'FOO', testBoolean: true },
{ age: 16, firstName: 'BAR', testBoolean: true },
]
const results = await testSheet.testRecords(inputRows)
expect(results).toMatchObject(expectedOutputRows)