[13pt] Build Script that Developers can kick off their own UAT runs against AWS Step Functions #1366
Labels
AWS
Fix or Contribution for running HAND FIM in AWS
CI/CD
CI/CD - devOps related
Med Priority
Sys Admin
Note: Now part of 1377 EPIC: FIM Sys Admin Tasks (and a few related FIM tasks)
Right now, to kick off a UAT or BED run using AWS Step Functions, it requires Rob to make some hard coded changes to various AWS tools and kick if off manually in AWS. PS. they could use their own HUC lists if they like.
This can be scripted so that a developer can have a script in our FIM git repo to make small edits and kick if off themselves. It will be a policy issue as far as when / how often this can be done, but it certainly far cheaper and quicker than doing it in EC2's and drop the reliance on Rob and the tech knowledge to do it.
At a minimum, creating this script for Rob or advance AWS developers has a fair bit of value.
Note: This could become very inexpensive if we rebuild the process in AWS of how we processing of HUCs. See Issue [1365](https://github.com/NOAA-OWP/inundation-mapping/issues/[1365](https://github.com/NOAA-OWP/inundation-mapping/issues/1365). But.. this task does not require that to be done, but if we do 1365, it becomes even quicker and cheaper allowing developers to have quick and valuable data for debugging, still policy driven.
The hardest part of this is finding a way to notify the developer if there is a failure or completion and a way or document no how to find the error if it is an AWS setup error and not a code error. Not common, but as it stand right now, I (Rob) have to manually watch that the AWS Step functions successfully completing key milestones. After that, it requires occasional checking to see when it is done as their is no automated way to tell when it is done. (This task could arguably be a separate card as it is very intensive to keep very close eye on a Step Function run for the first 30 mins, then randomly for up to 10 hours, then watch to figure it out if is is done). I have created card number 1367 for this task alone which has a lot of value even if we do near nothing else. This task is easy to do, but without 1367, it requires some moderate training for developers to see / find / fix errors.
Update: Dec 12, 2024: I wonder if the AWS API's have a way for us to query a running step function job. If so, we could create a script a developer can kick up periodically (especially a handful of times in the first 20 to 30 mins) instead of going into the AWS Console to the right page. If it exists, it likely won't give us 100% of the stuff we need to know to see if it is running, but would make it a lot easier. Certainly worth looking into. It wouldn't be great or comprehensive but better.
The text was updated successfully, but these errors were encountered: