Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Tool: Events indexing service - Milestone 4 #108

Merged
merged 1 commit into from
Nov 6, 2021

Conversation

chriswayoub
Copy link
Contributor

This PR is for issue #15

Submission Links & Documents

Full indexing service:

https://github.com/chriswayoub/flow-scanner

Full example deployment:

https://github.com/chriswayoub/flow-scanner-example

All milestone 4 requirements are met:

  • A user can configure the events they wish to watch using a list of Cadence event IDs. The service will continuously monitor all events in this list. The event list can only be changed upon startup — the service does not need to support dynamic configuration (e.g. adding event IDs using an API call).

Use CADENCE_EVENT_TYPES in the .env file of the example deployment

  • The service should watch for events in every block. It can (and should!) use the GetEventsByHeight range to query multiple blocks at once.

This is implemented in the EventScanner service

  • The service must not miss any events; if a failure occurs in a block, the event indexer should not proceed to the next block until all events have been fetched.

This is handled in various places, notable in the EventScanner class and in the EventBroadcaster classes

  • At a minimum, the service should support event delivery via HTTP webhooks. In this scenario, the consuming application will provide an HTTP endpoint as part of the startup configuration. The consumer should respond with appropriate status consumes to indicate a successful or unsuccessful delivery — these status codes are defined by the indexer.

HTTP delivery is supported as well as SNS and SQS. The architecture allows plugins to be written for any other backend as well.

  • The service only needs to support delivery to a single consumer.

Multiple consumers can be configured in a single instance, or multiple instances of the indexer can be run

  • All events should be delivered in the same order they were emitted on-chain

This is true, and events are grouped by transactionId if multiple events are being listened for

  • Each event should be delivered exactly once. The indexer will need to implement logic to ensure idempotency of events.

This is an optional feature that can be enabled in the service itself, or you can rely on external services to do it (SQS/SNS/etc)

  • Message Queueing

Currently supported services are SQS and SNS, and can be extended to additional services easily

  • Multiple Consumers

Multiple consumers are supported through the MulticastEventBroadcaster configuration

@kerrywei kerrywei requested a review from nvdtf November 1, 2021 17:34
@nvdtf
Copy link
Member

nvdtf commented Nov 2, 2021

Thank you for submitting flow-scanner, seems exciting! My initial feedback:

  • Can we merge the example repo into the main repo under /examples? This is a pattern we use across our repos and makes it easier to figure out how to use a project.
  • The documentation written on the example repo should be on the main repo (usage guide, installation, environment variables, ...). Is this because of the "package" vs "service" comment on the previous milestone? Is the example the actual event watching service? If this is the case we might need to reformat the repositories a bit to make the service the main tool provided. The library can then move into lib or somewhere similar. Developers will be interested in the service they can deploy or run, vs. importing code to write it themselves.

@chriswayoub
Copy link
Contributor Author

Yes, the example repo is the actual standalone "service". The library repository (flow-scanner) is meant to be dropped into an existing codebase if you don't want to run the standalone service. Personally, I think that this separation is beneficial since the actual service implementation has a lot more concrete dependencies (dotenv, tslog, etc). Since these are all optional and swappable, it made sense to me to have the library be a separate repository that doesn't require any of those.

Ideally, I'd even like to split out some of the concrete implementations of the things are that included in the flow-scanner repository. For instance, having a @flow-scanner scope that has plugins for things like @flow-scanner/cloudwatch-metrics, that way you can have a minimal set of dependencies that only include the things you are actually using in your project. The "example" repo (which based on your feedback I think should be renamed) would depend on a good set of default packages and implement the Docker container to make it easy to launch with minimal configuration.

Let me know your thoughts and I'll come up with an actionable list of items!

@nvdtf
Copy link
Member

nvdtf commented Nov 2, 2021

Makes sense. Renaming can help that cause here I guess to keep the separation.
flow-scanner and flow-scanner-service (was prev flow-scanner-example)
or
flow-scanner-lib (was previously flow-scanner) and flow-scanner (was previously flow-scanner-example)
I personally like the second one as the name "flow scanner" sounds like an image or service vs a library.

@chriswayoub
Copy link
Contributor Author

The repositories have been restructured and the flow-scanner service has been updated to be run as a standalone service out-of-the-box as opposed to requiring any code customization.

The service is now in the following repo:
https://github.com/chriswayoub/flow-scanner

The supporting library is now:
https://github.com/chriswayoub/flow-scanner-lib

If you have cloned anything, make sure you update those git remotes.

  • There is now an option to install the flow-scanner service using npm install -g so you can run it with flow-scanner. Instructions are in the repo, and the process will be simplified once the public npm package is available. This is in addition to the Docker container that was already available, I just wanted there to be as little friction as possible if someone wants to run it in one of the supported configurations. I'm not sure if native packages (like the Homebrew flow-emulator package) would be beneficial, but it shouldn't be hard to add if so.
  • The documentation has been updated to reflect all of the new configuration values, you can view the readme of the flow-scanner repo
  • Payloads schema for the different consumers is now documented
  • An optional HMAC-SHA512 signature is now available for the HTTP event broadcaster (details in the documentation)
  • The scanner will now start at the latest block height if you do not provide a starting block height (I had a few people get hung up on this when trying to launch for the first time)
  • The service now tries to gracefully exit on termination (beneficial if you are running in an ECS Container or similar)

The main work-in-progress right now is the technical documentation for the underlying flow-scanner-lib repo, but I tried to cover all of the basics at least for now.

Let me know your thoughts or any additional feedback!

@nvdtf
Copy link
Member

nvdtf commented Nov 4, 2021

Thanks for the restructuting. It looks great! A few points of feedback on the documentation:

  • The README file on flow-scanner-lib requires a bit of rewriting after the restructure. It needs to be written from the point of view of the library only.
  • Please add CONTRIBUTING docs to both repos, covering how an interested developer can contribute to the work here. The doc might also cover how a local dev loop is set up.
  • While going through the installation instructions I had to install rimraf:
    > rimraf build && tsc -p .
    sh: rimraf: command not found
    
    So I did
    npm install rimraf
    
    I'm not sure if my case was exceptional, might be helpful to add a requirements section to the guide and include this (and maybe more apparent ones like npm).

@chriswayoub
Copy link
Contributor Author

I'll get these docs updated! rimraf should have already been a dev dependency, maybe it was missed somehow. I'll make sure it is fixed, it should not require manual installation (beyond the main npm install in the readme)

@chriswayoub
Copy link
Contributor Author

Can you point me to where you saw the rimraf commands in the instructions? The only place it should appear is in the package.json "build" script, but that should use the rimraf version that was already in the devDependencies.

The flow-scanner-lib repo already had a CONTRIBUTING.md file, and I added one to flow-scanner as well. I also updated the flow-scanner-lib docs to correctly reference the main flow-scanner repo and make it clear that the flow-scanner-lib repo is a supporting library.

The /docs/ folder on the flow-scanner-lib repository has the beginning of the full technical writeup on the internals, which hopefully is a good guide on getting started for anyone interested in contributing. I'm planning on expanding that over time, and I'm sure it will continually evolve as the project grows.

Let me know if there is anything else I can do to address your comments!

@nvdtf
Copy link
Member

nvdtf commented Nov 5, 2021

Thanks. As you pointed out, I encountered the rimraf error during the build phase. I don't know why it was not installed automatically. Maybe this is because my node config is with root user and requires sudo for install?

@chriswayoub
Copy link
Contributor Author

Just to clarify, did you run npm build or did you try to execute rimraf build && tsc -p . from your shell directly? If it was the former, then I'm not really sure what happened unless you hadn't run npm install yet in the repo folder. If it was the latter, then your error makes sense if you didn't have rimraf installed globally.

Let me know if I missed anything actionable on this PR.

@nvdtf
Copy link
Member

nvdtf commented Nov 5, 2021

Just npm build. Anyway this sounds like a minor config issue on my side. Thanks for actioning the previous points. I guess we good here.

@chriswayoub
Copy link
Contributor Author

There are some additional things I considered including, but didn't want to give you a ton to review (or make it too overwhelming for newcomers).

  • Add built-in "deploy to Cloud infrastructure" commands - My only concern here is that if someone is doing this, they are probably technically proficient and will be making this part of a bigger deployment, so I'm not sure how much value it really adds on its own
  • Add a second adapter for the FlowClient class that uses a separate service I created in Go to query the blockchain for events. This adds the ability to query events before the current Spork height and also takes care of what I think is a bug that I reported in fcl when querying for events past the latest block. This would just be a drop-in replacement, but would make the deployment more complex since it will involve the separate Go service. I can make it easy with docker-compose once the Docker images/npm packages are public

Any thoughts on whether those would be applicable to the scope of this issue, or if they make it more intimidating for a new user?

@nvdtf
Copy link
Member

nvdtf commented Nov 5, 2021

My thoughts are:

  • I think the ability to properly containerize and run the project should be enough on this front. Showcasing how a containerized service works with other services (like a database and app) through a sample docker-compose.yml should be enough to demonstrate how one can run this service in a containerized cloud environment alongside other micro services.
  • I consider going back through sporks an advanced [and more importantly] one-time operation for most projects that are looking to sync to an older state. It's not absolutely needed but certainly desirable. It's best to implement in a way that doesn't make the current project more complex, maybe in another repo now that we have a separate lib repo.

@chriswayoub
Copy link
Contributor Author

Agreed on both points, thanks for the feedback. I’ll think about the second item and see if I can come up with a way to make it as frictionless as possible to make it an option.

@kerrywei kerrywei merged commit 6afa0e1 into onflow:main Nov 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants