Skip to content

Search and Typesense

Alex Ball edited this page Oct 31, 2022 · 2 revisions

Our bill search page is built on [Typesense]. We use a cloud function to sync the bills collection from firestore to typesense. On the frontend we use React InstantSearch Hooks for the pre-built integration with typesense and customize the UI.

See Production Environments for information on deploying.

Design

  • Run our own instance of Typesense in the cloud using Kubernetes.
  • Sync our Firestore documents to Typesense using cloud functions.
  • Update the frontend to query the Typesense server.

Implementation

There are 3 types of cloud functions:

  • checkSearchIndexVersion: This runs after each deployment. For each collection, it compares a hash of the schema to the version of the alias in Typesense. If they differ, it triggers a schema upgrade. It is triggered using a Pub/Sub topic that we publish to from our Github Actions deployment workflow.
  • upgrade(Bill)SearchIndex: This upgrades the search index schema for a single collection, such as bills or testimony. It bulk-upserts all documents from the Firestore collection into the new Typesense collection, then updates the alias to point to the new Typesense collection. It is triggered by document creation on a specific collection, similar to scraper batches.
  • sync(Bill)ToSearchIndex: This upserts or deletes an individual document into the current Typesense collection and is triggered on writes to the Firestore collection. It only upserts the document if the indexed fields change.

Syncing Firestore Documents

Whenever a document changes, we need to upload it to the search service so it can be indexed. We can use Firestore function triggers to respond to document changes. Additionally, we need to backfill the search index initially to pick up the current state of all documents.

Typesense organizes documents into collections, and requires us to specify the structure of each collection in a schema when we create them. Collection schemas cannot be altered, so when we change the fields we want to search on, we need to update the schema, create a new collection with that schema, and rerun the backfill operation to populate the index. Fortunately, we can use collection aliases to provide a consistent name for the frontend to use for the collection, so updating the collection only requires changes on the backend.

A cloud function automates upgrades. It is triggered by a pubsub event. We include the hash of the configuration in the collection name, the upgrader compares the configuration to existing collections, backfilling the collection if necessary