Skip to content

Commit

Permalink
Videos linked
Browse files Browse the repository at this point in the history
  • Loading branch information
pedrogk committed Nov 17, 2024
1 parent f71a5bb commit 983a599
Show file tree
Hide file tree
Showing 42 changed files with 77 additions and 47 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ timeslot: 100
gridarea: "11/4/13/5"
images:
- /images/sessions/2024/aip-63.jpg
video: https://youtu.be/Di1nhrr4-VM
---

Join us as we check in on the current status of AIP-63: DAG Versioning. This session will explore the motivations behind AIP-63, the challenges faced by Airflow users in understanding and managing DAG history, and how it aims to address them. From tracking TaskInstance history to improving DAG representation in the UI, we'll examine what we’ve already done and what’s next. We'll also touch upon the potential future steps outlined in AIP-66 regarding the execution of specific DAG versions.
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ timeslot: 101
gridarea: "12/2/13/3"
images:
- /images/sessions/2024/using-operational-data.jpg
video: https://youtu.be/-jDfKP5fNxo
---

Cost management is a continuous challenge for our data teams at Astronomer. Understanding the expenses associated with running our workflows is not always straightforward, and identifying which process ran a query causing unexpected usage on a given day can be time-consuming.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ gridarea: "13/2/14/3"
images:
- /images/sessions/2024/customizing-llm.jpg
slides: 2024/91-customizing-llms.pdf
video:
video: https://youtu.be/T8Leid9EoUI
---

Laurel provides an AI-driven timekeeping solution tailored for accounting and legal firms, automating timesheet creation by capturing digital work activities. This session highlights two notable AI projects:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ gridarea: "13/3/15/4"
images:
- /images/sessions/2024/hello-quality.jpg
slides: 2024/95-hello-quality.pdf
video:
video: https://youtu.be/lWCp-r-bQEI
---

Airflow operators are a core feature of Apache Airflow and it’s extremely important that we maintain high quality of operators, prevent regressions and on the other hand we help developers with automated tests results to double check if introduced changes don’t cause regressions or backward incompatible changes and we provide Airflow release managers with information whether a given version of a provider should be released or not yet.
Expand Down
2 changes: 1 addition & 1 deletion content/sessions/2024/106-refactoring-dags.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ day: 20243
timeslot: 106
gridarea: "14/2/15/3"
slides: 2024/92-refactoring-dags.pdf
video:
video: https://youtu.be/W0YDfUAQNRU
---

Feeling trapped in a maze of duplicate Airflow DAG code? We were too! That's why we embarked on a journey to build a centralized library, eliminating redundancy and unlocking delightful efficiency.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ gridarea: "14/3/15/4"
images:
- /images/sessions/2024/overcoming-python.jpg
slides: 2024/28-Overcoming-Custom-Python-Package-Hurdles-in-Airflow.pdf
video:
video: https://youtu.be/Vw5D4wm3JaM
---

DAG Authors, while constructing DAGs, generally use native libraries provided by Airflow in conjunction with python libraries available over public PyPI repositories.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ gridarea: "14/5/15/6"
images:
- /images/sessions/2024/robinhood.jpg
slides: 2024/34-Optimizing-Critical-Operations.pdf
video:
video: https://youtu.be/0-_m1CTKE1I
---

Airflow is widely used within Robinhood. In addition to traditional offline analytics use cases (to schedule ingestion and analytics workloads that populate our data lake), we also use Airflow in our backend services to orchestrate various workflows that are highly critical for the business, e.g: compliance and regulatory reporting, user facing reports and more.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ gridarea: "4/3/6/4"
images:
- /images/sessions/2024/overcoming-performance.jpg
slides: 2024/40-Overcoming-Performance-Hurdles-in-Integrating-dbt-with-Airflow.pdf
video:
video: https://youtu.be/gnJPFGvqLzU
---

The integration between dbt and Airflow is a popular topic in the community, both in previous editions of Airflow Summit, in Coalesce and the #airflow-dbt Slack channel.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,11 +11,11 @@ room: Elizabethan A+B
day: 20242
timeslot: 45
gridarea: "4/4/5/5"

images:
- /images/sessions/2024/airflow-self-service-ingestion.jpg
video: https://youtu.be/34BIya-YT40
---

Our Idea to platformize Ingestion pipelines is driven via Airflow in the background and streamline the entire ingestion process for Self Service.

With customer experience on top of it and making data ingestion fool proof as part of Analytics data team, Airflow is just complementing for our vision.
With customer experience on top of it and making data ingestion fool proof as part of Analytics data team, Airflow is just complementing for our vision.
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ gridarea: "5/4/6/5"
images:
- /images/sessions/2024/behaviour-driven.jpg
slides: 2024/44-Behaviour-Driven-Development-in-Airflow.pdf
video:
video: https://youtu.be/sUYsWG3jqMU
---

Behaviour Driven Development can, in the simplest of terms, be described as Test Driven Development, only readable. It is of course more than that, but that is not the aim of this talk. This talk aims to show:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ gridarea: "6/2/8/3"
images:
- /images/sessions/2024/linkedin.jpg
slides: 2024/38-Linkedins-Continuous-Deployment.pdf
video:
video: https://youtu.be/GW7n3E3c3Gg
---

LinkedIn Continuous Deployment (LCD), started with the goal of improving the deployment experience and expanding its outreach to all LinkedIn systems. LCD delivers a modern deployment UX and easy-to-customize pipelines which enables all LinkedIn applications to declare their deployment pipelines.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ gridarea: "6/3/8/4"
images:
- /images/sessions/2024/weathering.jpg
slides: 2024/41-Weathering-the-Cloud-Storms-With-Multi-Region-Airflow-Workflows.pdf
video:
video: https://youtu.be/IH8aU9jfaQk
---

Cloud availability zones and regions are not immune to outages. These zones regularly go down, and regions become unavailable due to natural disasters or human-caused incidents. Thus, if an availability zone or region goes down, so do your Airflow workflows and applications… unless your Airflow workflows function across multiple geographic locations.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ gridarea: "6/4/8/5"
images:
- /images/sessions/2024/a-new-dag-paradigm.jpg
slides: 2024/45-A-New-DAG-Paradigm.pdf
video:
video: https://youtu.be/1cqFkbKftzI
---

Astronomer’s data team recently underwent a major shift in how we work with Airflow. We’ll deep dive into the challenges which prompted that change, how we addressed them and where we are now.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ gridarea: "8/2/10/3"
images:
- /images/sessions/2024/stripe.jpg
slides: 2024/39-Stress-Free-Airflow-development.pdf
video:
video: https://youtu.be/kKUY2Vx76-0
---

At Stripe, compliance with regulations is of utmost importance, and ensuring the integrity of production data is crucial. To address this challenge, Stripe developed a powerful system called User Scope Mode (USM), which allows users to safely and efficiently test new or existing Airflow pipelines without the risk of corrupting production data.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ gridarea: "8/3/10/4"
images:
- /images/sessions/2024/investigating-many-loops.jpg
slides: 2024/42-Investigating-the-Many-Loops-of-the-Airflow-Scheduler.pdf
video:
video: https://youtu.be/BjWx2r3P3GA
---

The scheduler is unarguably the most important component of an Airflow cluster. It is also the most complex and misunderstood by practitioners and administrators alike.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,11 @@ timeslot: 57
gridarea: "8/4/10/5"
images:
- /images/sessions/2024/why-fail.jpg
video: https://youtu.be/Mqnmx9FbqNI
---

There are 3 certainties in life: death, taxes, and data pipelines failing. Pipelines may fail for a number of reasons: you may run out of memory, your credentials may expire, an upstream data source may not be reliable, etc. But there are patterns we can learn from!

Join us as we walk through an analysis we've done on a massive dataset of Airflow failure logs. We'll show how we used natural language processing and dimensionality reduction methods to explore the latent space of Airflow task failures in order to cluster, visualize, and understand failures.


Join us as we walk through an analysis we've done on a massive dataset of Airflow failure logs. We'll show how we used natural language processing and dimensionality reduction methods to explore the latent space of Airflow task failures in order to cluster, visualize, and understand failures.



We'll conclude the talk by walking through mitigation methods for common task failure reasons, and walk through how we can use Airflow to build an MLOps platform to turn this one-time analysis into a reliable, recurring activity.
We'll conclude the talk by walking through mitigation methods for common task failure reasons, and walk through how we can use Airflow to build an MLOps platform to turn this one-time analysis into a reliable, recurring activity.
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ gridarea: "11/2/12/3"
images:
- /images/sessions/2024/scalable-development.jpg
slides: 2024/52-Scalable-Development-of-Event-Driven-Airflow-DAGs.pdf
video:
video: https://youtu.be/Lnb0RV6GCMc
---

This usecase shows how we deal with data of different varieties from different sources. Each source sends data in different layout, timings, structures, location patterns sizes. The goal is to process the files within SLA and send them out. This a complex multi step processing pipeline that involves multiple spark jobs, api based integrations with microservices, resolving unique ids, deduplication and filtering. Note that this is an event driven system, but not a streaming data system. The files are of gigabyte scale, and each day the data being processed is of terabyte scale.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ gridarea: "11/4/12/5"
images:
- /images/sessions/2024/custom-executor.jpg
slides: 2024/58-The-Essentials-of-Custom-Executor-Development.pdf
video:
video: https://youtu.be/VK-2OO5-MNA
---

Since version 2.7 and the advent of AIP-51, Airflow has started to fully support the creation of custom executors. Before we dive into the components of an executor and how they work, we will briefly discuss the Executor Decoupling initiative which allowed this new feature. Once we understand the parts required, we will explore the process of crafting our own executors, using real-world examples, and demonstrations of executors developed within the Amazon Provider Package as a guide. By demystifying the process of executor creation and emphasizing the opportunities for contribution, we aim to empower Airflow users and providers to harness the full potential of custom executors, enriching the Airflow ecosystem as a whole!
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ timeslot: 63
gridarea: "12/2/13/3"
images:
- /images/sessions/2024/game-constant-learning.jpg
video: https://youtu.be/TH0SSdAZGPg
---

When developing Machine Learning (ML) models, the biggest challenges are often infrastructural. How do we deploy our model and expose an inference API? How can we retrain? Can we continuously evaluate performance and monitor model drift?
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@ gridarea: "12/3/13/4"
images:
- /images/sessions/2024/building-awareness.jpg
slides: 2024/56-Building-in-Resource-Awareness-and-Event-Dependency-into-Airflow.pdf
video:
video: https://youtu.be/9etsu9HA_HU
---

In this talk, we will explore how adding custom dependency checks into Airflow’s scheduling system can elevate Airflow’s performance.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,11 +13,9 @@ gridarea: "12/4/13/5"
images:
- /images/sessions/2024/hybrid-executors.jpg
slides: 2024/59-Hybrid-Executors.pdf
video:
video: https://youtu.be/3iKM28Ww73w
---

Executors are a core concept in Apache Airflow and they are an essential piece to the execution of DAGs. They continue to see investment and innovation including a new feature launching this year: Hybrid Execution.



This talk will give a brief overview of executors, how they work and what they are responsible for. Followed by a description of Hybrid Executors (AIP-61), a new feature to allow multiple executors to be used natively and seamlessly side by side within a single Airflow environment. We’ll deep dive into how this feature works, how users can make use of it, compare this new feature to what was available before, and finally a demo to see it in action. Don’t miss this chance to learn about the cutting edge capabilities of executors in Apache Airflow!
This talk will give a brief overview of executors, how they work and what they are responsible for. Followed by a description of Hybrid Executors (AIP-61), a new feature to allow multiple executors to be used natively and seamlessly side by side within a single Airflow environment. We’ll deep dive into how this feature works, how users can make use of it, compare this new feature to what was available before, and finally a demo to see it in action. Don’t miss this chance to learn about the cutting edge capabilities of executors in Apache Airflow!
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ gridarea: "13/3/14/4"
images:
- /images/sessions/2024/converting-legacy.jpg
slides: 2024/57-Converting-Legacy-Schedulers-to-Airflow.pdf
video:
video: https://youtu.be/bDP11dIaVl8
---

Having helped many customers to migrate thousands of workloads, we will discuss the process of migrations, and how we built an open-source framework to migrate legacy scheduler workflows via standard sets of patterns to Airflow Projects.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ gridarea: "13/4/14/5"
images:
- /images/sessions/2024/what-if.jpg
slides: 2024/60-what-if-running-airflow-tasks-without-the-workers.pdf
video:
video: https://youtu.be/WkljjYtqu8Q
---

Airflow executes all tasks on the workers, including deferrable operators that must run on the workers before deferring to the triggerer. However, running some tasks directly from the triggerer can be beneficial in certain situations. This presentation will explain how deferrable operators function and examine ways to modify the Airflow implementation to enable tasks to run directly from the triggerer.
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ timeslot: 70
gridarea: "15/2/16/3"
images:
- /images/sessions/2024/unlocking-ai-ford.jpg
video: https://youtu.be/s5Hxx5ux2rc
---

Ford Motor Company is undergoing a significant transformation, embracing AI and Machine Learning to power its smart mobility strategy, enhance customer experiences, and drive innovation in the automotive industry. Mach1ML, Ford's multi-million dollar ML platform, plays a crucial role in this journey by empowering data scientists and engineers to efficiently build, deploy, and manage ML models at scale. This presentation will delve into how Mach1ML leverages Apache Airflow as its orchestration layer to tackle the challenges of complex ML workflows that include disparate systems, manual processes, security concerns, and deployment complexities. We will explore the benefits of using Airflow, such as increased efficiency, improved reliability, enhanced scalability, and faster time-to-value. Additionally, we will showcase how Mach1ML utilizes Airflow capabilities to generate reusable templates and streamline environment promotions to further empower Ford's AI practitioners and accelerate the delivery of cutting-edge AI-powered solutions supporting the next generation of vehicles.
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,9 @@ gridarea: "15/3/16/4"
images:
- /images/sessions/2024/profiling-tasks.jpg
slides: 2024/64-Profiling-Airflow-tasks-with-Memray.pdf
video:
video: https://youtu.be/QHLedv-j8Hc
---

Profiling Airflow tasks can be difficult, specially in remote environments. In this talk I will demonstrate how we can leverage the capabilities of Airflow's plugin mechanism to selectively run Airflow tasks within the context of a profiler and with the help of operator links and custom views make the results available to the user.

The content of this talk can provide inspiration on how Airflow may in the future allow the gathering of custom task metrics and make those metrics easily accessible.
The content of this talk can provide inspiration on how Airflow may in the future allow the gathering of custom task metrics and make those metrics easily accessible.
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ gridarea: "16/3/17/4"
images:
- /images/sessions/2024/airflow-multi-cluster.jpg
slides: 2024/65-Airflow-and-multi-cluster-Slurm-working-together.pdf
video:
video: https://youtu.be/ol6k7df3Kr0
---

Meteosim provides environmental services, mainly based on weather and air quality intelligence, and helps customers make operational and tactical decisions and understand their companies' environmental impact. We introduced Airflow a couple of years ago to replace a huge Crontab file and we currently have around 7000 DAG Runs per day.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ gridarea: "16/4/17/5"
images:
- /images/sessions/2024/simplified-user-management.jpg
slides: 2024/68-Simplified-user-management-in-Airflow.pdf
video:
video: https://youtu.be/9kXnpoN8qMA
---

Before Airflow 2.9, user management was part of core Airflow, therefore modifying it or customizing it to fit user needs was not an easy process. Authentication and authorization managers (auth managers), is a new concept introduced in Airflow 2.9. It was introduced as extensible user management (AIP-56), allowing Airflow users to have a flexible way to integrate with organization's identity services. Organizations want a single place to manage permissions and FAB (Flask App Builder) made it difficult to achieve. In this talk, after explaining the concept of auth managers and why we built this, we will show you how you can leverage the new auth manager interface to build an authorization service for Airflow based on your existing identity provider. We will see that auth managers can be leveraged to change considerably how users and their permissions are managed in an Airflow environment.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,14 +13,12 @@ gridarea: "17/2/18/3"
images:
- /images/sessions/2024/100-airflow-environments.jpg
slides: 2024/63-How-we-run-100-Airflow-environments-and-millions-of-Tasks-as-a-Part-Time-job-using-Kubernetes.pdf
video:
video: https://youtu.be/GKkUtH9H57E
---

Balyasny Asset Management (BAM) is a diversified global investment firm founded in 2001 with over $20 billion in assets under management. We have more than 100 teams who run a variety of workloads that benefit from Orchestration and parallelization.



Platform Engineers working for companies with K8s ecosystems can use their Kubernetes knowledge and leverage their platform to run Airflow and troubleshoot problems successfully. BAM’s Kubernetes Platform provides production-ready Airflow environments that automatically get Logging, Metrics, Alerting, Scalability, Storage from a range of File Systems, Authentication, Dashboards, Secrets Management, and specialized compute including GPU, CPU Optimized, Memory Optimized and even Windows. If you can run thousands of Pods on your Kubernetes Cluster then you can run thousands of Tasks without needing to do anything! The intention of this talk is to cover:
Platform Engineers working for companies with K8s ecosystems can use their Kubernetes knowledge and leverage their platform to run Airflow and troubleshoot problems successfully. BAM’s Kubernetes Platform provides production-ready Airflow environments that automatically get Logging, Metrics, Alerting, Scalability, Storage from a range of File Systems, Authentication, Dashboards, Secrets Management, and specialized compute including GPU, CPU Optimized, Memory Optimized and even Windows. If you can run thousands of Pods on your Kubernetes Cluster then you can run thousands of Tasks without needing to do anything! The intention of this talk is to cover:

- Why K8s and Airflow work so well together

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ gridarea: "17/3/18/4"
images:
- /images/sessions/2024/bronco.jpg
slides: 2024/66-Bronco.pdf
video:
video: https://youtu.be/pWvdP1ukcXs
---

Airflow is not just purpose-built for data applications. It is a job scheduler on steroids. This is exactly what a cloud platform team needs: a configurable and scalable automation tool that can handle thousands of administrative tasks.
Expand Down
Loading

0 comments on commit 983a599

Please sign in to comment.