The goal of this homework is to familiarize users with workflow orchestration.
Start with the orchestrate.py file in the 03-orchestration/3.4 folder of the course repo: https://github.com/DataTalksClub/mlops-zoomcamp/blob/main/03-orchestration/3.4/orchestrate.py
You’d like to give the first task, read_data
a nicely formatted name.
How can you specify a task name?
Hint: look in the docs at https://docs.prefect.io or check out the doc string in a code editor.
@task(retries=3, retry_delay_seconds=2, name="Read taxi data")
@task(retries=3, retry_delay_seconds=2, task_name="Read taxi data")
@task(retries=3, retry_delay_seconds=2, task-name="Read taxi data")
@task(retries=3, retry_delay_seconds=2, task_name_function=lambda x: f"Read taxi data")
Cron is a common scheduling specification for workflows.
Using the flow in orchestrate.py
, create a deployment.
Schedule your deployment to run on the third day of every month at 9am UTC.
What’s the cron schedule for that?
0 9 3 * *
0 0 9 3 *
9 * 3 0 *
* * 9 3 0
Download the January 2023 Green Taxi data and use it for your training data. Download the February 2023 Green Taxi data and use it for your validation data.
Make sure you upload the data to GitHub so it is available for your deployment.
Create a custom flow run of your deployment from the UI. Choose Custom Run for the flow and enter the file path as a string on the JSON tab under Parameters.
Make sure you have a worker running and polling the correct work pool.
View the results in the UI.
What’s the final RMSE to five decimal places?
- 6.67433
- 5.19931
- 8.89443
- 9.12250
Download the February 2023 Green Taxi data and use it for your training data. Download the March 2023 Green Taxi data and use it for your validation data.
Create a Prefect Markdown artifact that displays the RMSE for the validation data. Create a deployment and run it.
What’s the RMSE in the artifact to two decimal places ?
- 9.71
- 12.02
- 15.33
- 5.37
It’s often helpful to be notified when something with your dataflow doesn’t work as planned. Create an email notification for to use with your own Prefect server instance. In your virtual environment, install the prefect-email integration with
pip install prefect-email
Make sure you are connected to a running Prefect server instance through your Prefect profile. See the docs if needed: https://docs.prefect.io/latest/concepts/settings/#configuration-profiles
Register the new block with your server with
prefect block register -m prefect_email
Remember that a block is a Prefect class with a nice UI form interface. Block objects live on the server and can be created and accessed in your Python code.
See the docs for how to authenticate by saving your email credentials to a block and note that you will need an App Password to send emails with Gmail and other services. Follow the instructions in the docs.
Create and save an EmailServerCredentials
notification block.
Use the credentials block to send an email.
Test the notification functionality by running a deployment.
What is the name of the pre-built prefect-email task function?
send_email_message
email_send_message
send_email
send_message
The hosted Prefect Cloud lets you avoid running your own Prefect server and has automations that allow you to get notifications when certain events occur or don’t occur.
Create a free forever Prefect Cloud account at app.prefect.cloud and connect your workspace to it following the steps in the UI when you sign up.
Set up an Automation from the UI that will send yourself an email when a flow run completes. Run one of your existing deployments and check your email to see the notification.
Make sure your active profile is pointing toward Prefect Cloud and make sure you have a worker active.
What is the name of the second step in the Automation creation process?
- Details
- Trigger
- Actions
- The end
- Submit your results here: https://forms.gle/nVSYH5fGGamdY1LaA
- You can submit your solution multiple times. In this case, only the last submission will be used
- If your answer doesn't match options exactly, select the closest one
The deadline for submitting is 12 June (Monday), 23:00 CEST (Berlin time).
After that, the form will be closed.