Data Providers are one of the most important stakeholders when it comes to the Data NFT ecosystem. The objective of this template is to help data providers run their data gathering scripts automatically and save their results in a "set it and forget it" way.
This project template aims to help data providers run automated scripts that updates datasets daily under the same URL.
This is done by using a GitHub Action that runs the Data Provider's Python script periodically and uploads the output to a AWS S3 bucket.
-
A GitHub account; you can create one here
-
Available GitHub Actions usage minutes; in public repositories, you have unlimited minutes by default, but if you use a private repository, you have 2000 minutes for free; (if your script takes 1 minute to run, you can run it 2000 times per month; if your script takes less than 1 minute to run, the usage time will be rounded to 1 minute); if you need more minutes you can set up a custom billing plan, but if you run your scripts once a day it's hard to get there (note: GitHub action minutes are cumulative per account, so if you have multiple repositories, the minutes will be shared between them)
-
An AWS account; you can create one here
-
Git installed locally; you can get it here
-
GitHub CLI installed locally; you can get it here
-
A Python script that outputs its results to a file (or multiple ones)
Note: You can jump over this step if you have installed and used git before on your local machine. Open a terminal (command line interface). Use this series of commands:
git config --global user.name "Your name here"
git config --global user.email "Your mail here"
gh auth login
After this third command, follow the steps to login to your GitHub account.
Note: You can jump over this step if you already have an AWS S3 bucket.
-
Go to AWS S3 and click on "Sign in to the Console" on the top right corner of the page. If you are not logged in, you will be asked to log in with your AWS account.
-
Once logged in to the AWS Console, use the search bar to search for "S3" and click on the first result. Once on the S3 page, click on "Create bucket". Please note that if you going to be eventually using a custom domain name to sit in front of your AWS S3 Data Stream, then you HAVE to name your S3 bucket the EXACT same name as your intended full domain path (subdomain + domain name. e.g.
dataassets.alice-datanft-bucket.com
). to learn more about this, head over to this guide -
Select a name for your bucket and choose a preferred region for its hosting. For the "Object ownership" option choose ACLs enabled. For "Block Public Access settings for this bucket" make sure all the checkboxes are UNchecked (except for the one asking you to acknowledge that these settings might result in this bucket becoming public). You can let the other options as default. Click on "Create bucket". We are allowing for Public access to all contents in our S3 bucket as the Data Stream we host in this bucket will need to be publicly accessible. DO NOT use this S3 bucket to manually store any of your other files or personal content as they become publicly accessible and can be easily discovered by anyone in the world. ONLY use this bucket to host your Data NFT's Data Stream which gets automatically updated via the Github template and script steps given below.
-
Once your new S3 bucket has been successfully created as per the last step, find and select it in your S3 console and then in your bucket's top menu, go to "Permissions" and scroll down until you see "Cross-origin resource sharing (CORS)". Click on "Edit" and paste the following text there:
[
{
"AllowedHeaders": [
"*"
],
"AllowedMethods": [
"GET"
],
"AllowedOrigins": [
"*"
],
"ExposeHeaders": [
"Access-Control-Allow-Origin"
]
}
]
Note: AWS S3 is (at the time of writing this guide) offering a free tier that allows you to store up to 5GB of data for free for 12 months. If you need more storage, you can set up a custom billing plan. You can find out more about AWS Free Tier here and about AWS pricing here.
-
Go on GitHub and "Create a new repository" under your desired account using this template (click on the green "Use this template" button on the top right corner of this page). Select a name for your repository, make sure you make this a Private repository as you will be using it to update a Public AWS S3 bucket and you want to protect this. Keep all the other settings as is and hit "create repository from template". Wait a few seconds for your repository to be generated.
-
Once your repository is generated, click on the green "Code" button and copy the HTTPS link that appears. Open a terminal on your local machine and run the following command:
git clone YOUR_REPOSITORY_URL
This will clone the repository to your local machine.
-
Go back to your repository page on GitHub. Click on "Settings". Then, under the "Security" section, click on "Secrets and variables" and then on "Actions".
-
We will have to create three "Secrets" that will be used inside of our GitHub Action. We are using them as secrets instead of directly in the code in order to keep these variables private from the public. Click on "New repository secret" and create the following secrets:
- Name: S3_BUCKET_NAME, Secret: the name of the AWS S3 bucket you created in the previous step (if you don't remember it, you can find it by going to your account's AWS S3 page)
- Name: S3_KEY_ID, Secret: the AWS Access Key ID of your AWS account. You can generate one by clicking on your account name on the right top corner in AWS and then clicking on "security credentials". Then, scroll down to access keys and click "Create access key". After accepting the prompt and click again on "Create access key", your key will be created. The characters under "Access key" is your key ID. You have a button that you can use to copy it. Don't close this page after that, we will also need the secret access key for the next secret.
- Name: S3_ACCESS_KEY, Secret: the "Secret access key" corresponding to your AWS Access Key ID. You can find it in the same AWS page that was open as was used the previous secret.
- Name: S3_OUTPUT_FOLDER, Secret: if you want to target a specific output folder in AWS S3 that's within your new S3 bucket to store your, you can do this by using a new optional Github Secret called S3_OUTPUT_FOLDER (e.g. S3_OUTPUT_FOLDER = subfolder/anothersubfolder will instruct the script to put your output file in "subfolder/anothersubfolder/trailblazer.json").
IMPORTANT: the AWS Access key we just created in the above step is highly confidential and allows the public to access your AWS Account and run servers or other infrastructure that may result in you having to pay for these resources. DO NOT share the above "Access key" and "Secret access key" values WITH ANYONE under ANY CIRCUMSTANCE. ONLY use it per this guide and store it as Github Secrets in your private repository.
- You can now go open the repository you cloned on your local machine. For this template to work flawlessly it is important to adapt your script by following the following standards:
-
The main file that is triggered via the command line should be named "main.py". Feel free to delete the current main.py file and replace it with your own. You can have more files in your repository, but make sure that the main file is named "main.py" and that it is in the root of the repository.
-
Make sure that all the packages that are not included by default in Python are listed in the "requirements.txt" file, one per line. Feel free to delete the current requirements that are listed there as an example for now.
-
In your "Input" folder your should see a file titled "trailblazer.csv". This is the CSV file (what is a CSV file?) that you can easily update to add new manage the content of the TrailBlazer Data Stream. You can learn more about the content options here https://docs.itheum.io/product-docs/product/trailblazer
-
The "output" folder has a "trailblazer.json" file within it as a sample that you can view. Note that in this script, each time you update the "trailblazer.csv" file in the "input" folder and then "add and commit your file to GitHub" as below, the "main.py" fill will automatically generate the "output" file in the output folder, so you don't need to do anything specific here.
-
Before you run the next step, do a final check to ensure you did not break the CSV file format. To do this, open your CSV file in software like Excel or Sheets and ensure it's formatted correctly in columns and rows and that you don't get any error messages, alerts, or issues with the layout.
- Once you are done with this, your script is ready to go into GitHub. Go back to your terminal and run the following commands:
git add .
git commit -m "Enter any custom commit message here as a note on what changes you made"
git push
- Congrats! Now everything is ready and the script will run daily at 23:30 UTC or whenever you push new code to your repository (which we just did in the above steps). Whenever you make a change to your code locally, you can use the three commands above to push the changes to GitHub and the script will run again. You can also check the logs of the script by going to your repository page on GitHub, clicking on "Actions" and then on the latest workflow run. You can also check the output of the script by going to your AWS S3 bucket and checking the files in the "output" folder.
If the workflow run was successful you can now go to your AWS S3 and click on your bucket. You should be able to see your objects (as object is a "file" in AWS S3 and it's the file we updated via the Github process above) there. Clicking on your object will allow you to see a link (called "Object URL") that can be used to access that object (file). You can now use that link to create a Data NFT!
- Note that for your TrailBlazer output file, you can change the "name", "creator" and "created_on" information by changing the values in the "config.yaml" directory. Note that the value for "name" is also used as the output file name (e.g. trailblazer.json). If for whatever reason, you want to target a specific output folder in AWS S3 that's within your new S3 bucket, you can do this by using a new Github Secret called S3_OUTPUT_FOLDER (e.g. S3_OUTPUT_FOLDER = subfolder/anothersubfolder will instruct the script to put your output file in "subfolder/anothersubfolder/trailblazer.json")
You can customize by changing the update-data.yaml file inside of the .github/workflows folder. If you don't see the .github folder, make sure you enabled seeing hidden files inside your explorer. More about GitHub Actions here
Feel free the contact the development team if you wish to contribute or if you have any questions. If you find any issues, please report them in the Issues sections of the repository. You can also create your own pull requests which will be analyzed by the team.
This open-source SOFTWARE PRODUCT is provided by THE PROVIDER "as is" and "with all faults." THE PROVIDER makes no representations or warranties of any kind concerning the safety, suitability, lack of viruses, inaccuracies, typographical errors, or other harmful components of this SOFTWARE PRODUCT. There are inherent dangers in the use of any software, and you are solely responsible for determining whether this SOFTWARE PRODUCT is compatible for your needs. You are also solely responsible for the protection of your equipment, your private credentials (e.g AWS account Key and Secret, GitHub access credentials etc.) and backup of your data, and THE PROVIDER will not be liable for any damages you may suffer in connection with using, modifying, or distributing this SOFTWARE PRODUCT.