AE Data Flow

This is the primary repository for the data pipelines of the Application Engineering (AE) team at the NYC Department of City Planning (DCP).

These pipelines are used to populate the databases used by our APIs and are called "data flows".

Documentation

An overview of the data flow design is in the documentation folder.

Local setup

Note

These instructions depend on docker and docker compose If you need to install docker compose, follow these instructions.

Set environment variables

Create a file called .env in the root folder of the project and copy the contents of sample.env into that new file.

Next, fill in the blank values.

Note: Omit https:// from the spaces endpoint, leaving only the domain itself

Install dependencies

The data flow is controlled primarily through node.js modules. Configure these dependencies by:

Using Node v20: nvm use
Installing node modules: npm i

Run the local zoning api database

The data-flow steps are run against the zoning-api database. Locally, this relies on these two containers running on the same network. The zoning-api creates the data network, which the data-flow db container can then join. Before continuing with the data-flow setup, follow the steps within nycplanning/ae-zoning-api to get its database running in a container on a data docker network.

Run the local data flow

After setting up the zoning-api, return to this repository and run the data-flow

Build and run the flow database container

docker compose up --build -d

Note: If you built a previous version of the data flow database, it may be in an incompatible state. To clear this state, run docker compose down and run the above "up" command again.

Run the data flow process to populate the zoning api database with data

BUILD=all npm run flow

The "BUILD" environment variable specifies which domain to update. Initial database seeding should use "all". Subsequent runs may want to only update specific domains. The BUILD domain options are: admin, pluto, and capital-planning.

Run pieces of the local data flow

The data flow may fail at one of the steps. To pick up the data flow from an intermediate step, reference the design to run individual steps or groups of steps.

If the data flow database is in a confused or irreparable state, it can be wiped using docker compose down.

If the data flow is incomplete but it's necessary to pause and restart later, the database can be paused and saved with docker compose stop

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
.github/workflows		.github/workflows
data		data
db		db
documentation		documentation
drizzle		drizzle
minio		minio
pg		pg
shp		shp
.gitignore		.gitignore
.nvmrc		.nvmrc
.prettierignore		.prettierignore
README.md		README.md
common-services.yml		common-services.yml
compose.ci.yml		compose.ci.yml
compose.yml		compose.yml
package-lock.json		package-lock.json
package.json		package.json
sample.env		sample.env
schemas.ts		schemas.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AE Data Flow

Documentation

Local setup

Set environment variables

Install dependencies

Run the local zoning api database

Run the local data flow

Run pieces of the local data flow

About

Releases

Packages

Contributors 5

Languages

NYCPlanning/ae-data-flow

Folders and files

Latest commit

History

Repository files navigation

AE Data Flow

Documentation

Local setup

Set environment variables

Install dependencies

Run the local zoning api database

Run the local data flow

Run pieces of the local data flow

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages