Skip to content

Commit

Permalink
slides 2024
Browse files Browse the repository at this point in the history
  • Loading branch information
AlbertoRJ committed Sep 25, 2024
1 parent 4370403 commit 3f7e3f1
Show file tree
Hide file tree
Showing 92 changed files with 89 additions and 5 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ timeslot: 10
gridarea: "6/3/7/4"
images:
- /images/sessions/2024/streamlining-airflow-dag.jpg
slides: 2024/6-Unleash-the-Power-of-AI.pdf
video:
---

Nowadays, conversational AI is no longer exclusive to large enterprises. It has become more accessible and affordable, opening up new possibilities and business opportunities. In this session, discover how you can leverage Generative AI as your AI pair programmer to suggest DAG code and recommend entire functions in real-time, directly from your editor. Visualize how to harness the power of ML, trained on billions of lines of code, to transform natural language prompts into coding suggestions. Seamlessly cycle through lines of code, complete function suggestions, and choose to accept, reject, or edit them. Witness firsthand how Generative AI provides recommendations based on the project's context and style conventions. The objective is to equip you with techniques that allow you to spend less time on boilerplate and repetitive code patterns, and more time on what truly matters: building exceptional orchestration software.
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ timeslot: 102
gridarea: "12/3/13/4"
images:
- /images/sessions/2024/airflow-engine.jpg
slides: 2024/94-Airflow-as-an-Engine.pdf
video:
---

Airflow is often used for running data pipelines, which themselves connect with other services through the provider system. However, it is also increasingly used as an engine under-the-hood for other projects building on top of the DAG primitive. For example, Cosmos is a framework for automatically transforming dbt DAGs into Airflow DAGs, so that users can supplement the developer experience of dbt with the power of Airflow.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ timeslot: 103
gridarea: "13/2/14/3"
images:
- /images/sessions/2024/customizing-llm.jpg
slides: 2024/91-customizing-llms.pdf
video:
---

Laurel provides an AI-driven timekeeping solution tailored for accounting and legal firms, automating timesheet creation by capturing digital work activities. This session highlights two notable AI projects:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ timeslot: 104
gridarea: "13/3/15/4"
images:
- /images/sessions/2024/hello-quality.jpg
slides: 2024/95-hello-quality.pdf
video:
---

Airflow operators are a core feature of Apache Airflow and it’s extremely important that we maintain high quality of operators, prevent regressions and on the other hand we help developers with automated tests results to double check if introduced changes don’t cause regressions or backward incompatible changes and we provide Airflow release managers with information whether a given version of a provider should be released or not yet.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ timeslot: 105
gridarea: "13/4/14/5"
images:
- /images/sessions/2024/using-ray.jpg
slides: 2024/97-Using-the-power-of-Apache-Airflow-and-Ray-for-Scalable-AI-deployments.pdf
video:
---

Many organizations struggle to create a well-orchestrated AI infrastructure, using separate and disconnected platforms for data processing, model training, and inference, which slows down development and increases costs. There's a clear need for a unified system that can handle all aspects of AI development and deployment, regardless of the size of data or models.
Expand Down
3 changes: 2 additions & 1 deletion content/sessions/2024/106-refactoring-dags.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,8 @@ track: Airflow intro talks
day: 20243
timeslot: 106
gridarea: "14/2/15/3"

slides: 2024/92-refactoring-dags.pdf
video:
---

Feeling trapped in a maze of duplicate Airflow DAG code? We were too! That's why we embarked on a journey to build a centralized library, eliminating redundancy and unlocking delightful efficiency.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ timeslot: 108
gridarea: "14/4/15/5"
images:
- /images/sessions/2024/exploring-dag.jpg
slides: 2024/98-Exploring-DAG-Design-Patterns-in-Apache-Airflow.pdf
video:
---

This talk delves into advanced Directed Acyclic Graph (DAG) design patterns that are pivotal for optimizing data pipeline management and boosting efficiency. We'll cover dynamic DAG generation, which allows for flexible, scalable workflow creation based on real-time data and configurations. Learn about task grouping and SubDAGs to enhance readability and maintainability of complex workflows. We'll also explore parameterized DAGs for injecting runtime parameters into tasks, enabling versatile and adaptable pipeline configurations. Additionally, the session will address branching and conditional execution to manage workflow paths dynamically based on data conditions or external triggers. Lastly, understand how to leverage parallelism and concurrency to maximize resource utilization and reduce execution times. This session is designed for intermediate to advanced users who are familiar with the basics of Airflow and looking to deepen their understanding of its more sophisticated capabilities.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ timeslot: 11
gridarea: "6/4/7/5"
images:
- /images/sessions/2024/running-anywhere.jpg
slides: 2024/8-Running-Airflow-tasks-anywhere-in-any-language.pdf
video:
---

Imagine a world where writing Airflow tasks in languages like Go, R, Julia, or maybe even Rust is not just a dream but a native capability. Say goodbye to BashOperators; welcome to the future of Airflow task execution.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ gridarea: "6/5/7/6"

images:
- /images/sessions/2024/activating-operational.jpg
slides: 2024/10-Activating-operational-metadata-with-Airflow-Atlan-and-OpenLineage.pdf
video:
---

OpenLineage is an open standard for lineage data collection, integrated into the Airflow codebase, facilitating lineage collection across providers like Google, Amazon, and more.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ timeslot: 15
gridarea: "8/3/9/4"
images:
- /images/sessions/2024/unlocking-fmops.jpg
slides: 2024/14-Unlocking-FMOps-LLMOps-using-Apache-Airflow.pdf
video:
---

In the last few years Large Language Models (LLMs) have risen to prominence as outstanding tools capable of transforming businesses. However, bringing such solutions and models to the business-as-usual operations is not an easy task. In this session, we delve into the operationalization of generative AI applications using MLOps principles, leading to the introduction of foundation model operations (FMOps) or LLM operations using Apache Airflow. We further zoom into aspects of expected people and process mindsets, new techniques for model selection and evaluation, data privacy, and model deployment. Additionally, know how you can use the prescriptive features of Apache Airflow to aid your operational journey. Whether you are building using out of the box models (open-source or proprietary), creating new foundation models from scratch, or fine-tuning an existing model, with the structured approaches described you can effectively integrate LLMs into your operations, enhancing efficiency and productivity without causing disruptions in the cloud or on-premises.
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ timeslot: 17
gridarea: "8/5/9/6"
images:
- /images/sessions/2024/optimize-dags.jpg
slides: 2024/20-Optimize-Your-DAGs-Embrace-Dag-Params-for-Efficiency-and-Simplicity.pdf
---

In the realm of data engineering, there is a prevalent tendency for professionals to develop similar Directed Acyclic Graphs (DAGs) to manage analogous tasks. Leveraging Dag Params presents an effective strategy for mitigating redundancy within these DAGs. Moreover, the utilization of Dag Params facilitates seamless enforcement of user inputs, thereby streamlining the process of incorporating validations into the DAG codebase.
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ timeslot: 19
gridarea: "9/3/10/4"
images:
- /images/sessions/2024/from-oops.jpg
slides: 2024/15-From-Oops-to-Ops.pdf
video:
---

This session reveals an experimental venture integrating OpenAI's AI technologies with Airflow, aimed at advancing error diagnosis.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ timeslot: 24
gridarea: "10/4/11/5"
images:
- /images/sessions/2024/deep-dive-configuration.jpg
slides: 2024/19-A-deep-dive-into-Airflow-configuration-options-for-scalability.pdf
video:
---

Apache Airflow has a lot of configuration options. A change in some of these options can affect the performance of Airflow.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ timeslot: 29
gridarea: "12/4/13/5"
images:
- /images/sessions/2024/empowering-teams.jpg
slides: 2024/29-Empowering-more-teams-in-your-organization-to-self-service-their-Airflow-needs.pdf
video:
---

Does your organization feel like the responsibility to write Airflow DAGs, handle the Airflow infrastructure administration, debug failing tasks, and keep up with new features and best practices is too much for too few people? Perhaps you only have one data team that owns all of that; or you have too many teams that have too many permissions into other teams' DAGs.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ timeslot: 30
gridarea: "12/5/13/6"
images:
- /images/sessions/2024/optimizing-performance.jpg
slides: 2024/32-Optimizing-Airflow-Performance.pdf
video:
---

Airflow, an open-source platform for orchestrating complex data workflows, is widely adopted for its flexibility and scalability. However, as workflows grow in complexity and scale, optimizing Airflow performance becomes crucial for efficient execution and resource utilization. This session delves into the importance of optimizing Airflow performance and provides strategies, techniques, and best practices to enhance workflow execution speed, reduce resource consumption, and improve system efficiency. Attendees will gain insights into identifying performance bottlenecks, fine-tuning workflow configurations, leveraging advanced features, and implementing optimization strategies to maximize pipeline throughput. Whether you're a seasoned Airflow user or just getting started, this session equips you with the knowledge and tools needed to optimize your Airflow deployments for optimal performance and scalability. We'll also explore topics such as DAG writing best practices, monitoring and updating Airflow configurations, and database performance optimization, covering unused indexes, missing indexes, and minimizing table and index bloat.
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ timeslot: 36
gridarea: "14/3/15/4"
images:
- /images/sessions/2024/overcoming-python.jpg
slides: 2024/28-Overcoming-Custom-Python-Package-Hurdles-in-Airflow.pdf
video:
---

DAG Authors, while constructing DAGs, generally use native libraries provided by Airflow in conjunction with python libraries available over public PyPI repositories.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ timeslot: 37
gridarea: "14/4/15/5"
images:
- /images/sessions/2024/maxime-preset.jpg
slides: 2024/31-AI-Reality-Checkpoint.pdf
video:
---

In the past 18 months, artificial intelligence has not just entered our workspaces – it has taken over. As we stand at the crossroads of innovation and automation, it's time for a candid reflection on how AI has reshaped our professional lives, and to talk about where it's been a game changer, where it's falling short, and what's about to shift dramatically in the short term.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ timeslot: 38
gridarea: "14/5/15/6"
images:
- /images/sessions/2024/robinhood.jpg
slides: 2024/34-Optimizing-Critical-Operations.pdf
video:
---

Airflow is widely used within Robinhood. In addition to traditional offline analytics use cases (to schedule ingestion and analytics workloads that populate our data lake), we also use Airflow in our backend services to orchestrate various workflows that are highly critical for the business, e.g: compliance and regulatory reporting, user facing reports and more.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ timeslot: 44
gridarea: "4/3/6/4"
images:
- /images/sessions/2024/overcoming-performance.jpg
slides: 2024/40-Overcoming-Performance-Hurdles-in-Integrating-dbt-with-Airflow.pdf
video:
---

The integration between dbt and Airflow is a popular topic in the community, both in previous editions of Airflow Summit, in Coalesce and the #airflow-dbt Slack channel.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ gridarea: "5/4/6/5"

images:
- /images/sessions/2024/behaviour-driven.jpg
slides: 2024/44-Behaviour-Driven-Development-in-Airflow.pdf
video:
---

Behaviour Driven Development can, in the simplest of terms, be described as Test Driven Development, only readable. It is of course more than that, but that is not the aim of this talk. This talk aims to show:
Expand Down
2 changes: 2 additions & 0 deletions content/sessions/2024/49-linkedin-s-continuous-deployment.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ gridarea: "6/2/8/3"

images:
- /images/sessions/2024/linkedin.jpg
slides: 2024/38-Linkedins-Continuous-Deployment.pdf
video:
---

LinkedIn Continuous Deployment (LCD), started with the goal of improving the deployment experience and expanding its outreach to all LinkedIn systems. LCD delivers a modern deployment UX and easy-to-customize pipelines which enables all LinkedIn applications to declare their deployment pipelines.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ gridarea: "6/3/8/4"

images:
- /images/sessions/2024/weathering.jpg
slides: 2024/41-Weathering-the-Cloud-Storms-With-Multi-Region-Airflow-Workflows.pdf
video:
---

Cloud availability zones and regions are not immune to outages. These zones regularly go down, and regions become unavailable due to natural disasters or human-caused incidents. Thus, if an availability zone or region goes down, so do your Airflow workflows and applications… unless your Airflow workflows function across multiple geographic locations.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ gridarea: "6/4/8/5"

images:
- /images/sessions/2024/a-new-dag-paradigm.jpg
slides: 2024/45-A-New-DAG-Paradigm.pdf
video:
---

Astronomer’s data team recently underwent a major shift in how we work with Airflow. We’ll deep dive into the challenges which prompted that change, how we addressed them and where we are now.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,8 @@ gridarea: "8/2/10/3"

images:
- /images/sessions/2024/stripe.jpg
slides: 2024/39-Stress-Free-Airflow-development.pdf
video:
---

At Stripe, compliance with regulations is of utmost importance, and ensuring the integrity of production data is crucial. To address this challenge, Stripe developed a powerful system called User Scope Mode (USM), which allows users to safely and efficiently test new or existing Airflow pipelines without the risk of corrupting production data.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,8 @@ gridarea: "8/3/10/4"

images:
- /images/sessions/2024/investigating-many-loops.jpg
slides: 2024/42-Investigating-the-Many-Loops-of-the-Airflow-Scheduler.pdf
video:
---

The scheduler is unarguably the most important component of an Airflow cluster. It is also the most complex and misunderstood by practitioners and administrators alike.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ timeslot: 62
gridarea: "11/4/12/5"
images:
- /images/sessions/2024/custom-executor.jpg
slides: 2024/58-The-Essentials-of-Custom-Executor-Development.pdf
video:
---

Since version 2.7 and the advent of AIP-51, Airflow has started to fully support the creation of custom executors. Before we dive into the components of an executor and how they work, we will briefly discuss the Executor Decoupling initiative which allowed this new feature. Once we understand the parts required, we will explore the process of crafting our own executors, using real-world examples, and demonstrations of executors developed within the Amazon Provider Package as a guide. By demystifying the process of executor creation and emphasizing the opportunities for contribution, we aim to empower Airflow users and providers to harness the full potential of custom executors, enriching the Airflow ecosystem as a whole!
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ timeslot: 65
gridarea: "12/4/13/5"
images:
- /images/sessions/2024/hybrid-executors.jpg
slides: 2024/59-Hybrid-Executors.pdf
video:
---

Executors are a core concept in Apache Airflow and they are an essential piece to the execution of DAGs. They continue to see investment and innovation including a new feature launching this year: Hybrid Execution.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ timeslot: 67
gridarea: "13/3/14/4"
images:
- /images/sessions/2024/converting-legacy.jpg
slides: 2024/57-Converting-Legacy-Schedulers-to-Airflow.pdf
video:
---

Having helped many customers to migrate thousands of workloads, we will discuss the process of migrations, and how we built an open-source framework to migrate legacy scheduler workflows via standard sets of patterns to Airflow Projects.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ timeslot: 68
gridarea: "13/4/14/5"
images:
- /images/sessions/2024/what-if.jpg
slides: 2024/60-what-if-running-airflow-tasks-without-the-workers.pdf
video:
---

Airflow executes all tasks on the workers, including deferrable operators that must run on the workers before deferring to the triggerer. However, running some tasks directly from the triggerer can be beneficial in certain situations. This presentation will explain how deferrable operators function and examine ways to modify the Airflow implementation to enable tasks to run directly from the triggerer.
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ timeslot: 71
gridarea: "15/3/16/4"
images:
- /images/sessions/2024/profiling-tasks.jpg
slides: 2024/64-Profiling-Airflow-tasks-with-Memray.pdf
video:
---

Profiling Airflow tasks can be difficult, specially in remote environments. In this talk I will demonstrate how we can leverage the capabilities of Airflow's plugin mechanism to selectively run Airflow tasks within the context of a profiler and with the help of operator links and custom views make the results available to the user.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ timeslot: 73
gridarea: "16/2/17/3"
images:
- /images/sessions/2024/tuned-airflow.jpg
slides: 2024/62-how-we-tuned-our-airflow-to-make-1-2-million-dag-runs-per-day.pdf
video:
---

As we deployed Airflow in our enterprise connected to various event sources to implement our data-driven pipelines we were faced with event storms a couple of times. As of such event storms happened often unplanned and with increased load waves we iteratively tuned the setup in multiple iterations. We were in panic and also needed to add some quick workarounds sometime.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ timeslot: 74
gridarea: "16/3/17/4"
images:
- /images/sessions/2024/airflow-multi-cluster.jpg
slides: 2024/65-Airflow-and-multi-cluster-Slurm-working-together.pdf
video:
---

Meteosim provides environmental services, mainly based on weather and air quality intelligence, and helps customers make operational and tactical decisions and understand their companies' environmental impact. We introduced Airflow a couple of years ago to replace a huge Crontab file and we currently have around 7000 DAG Runs per day.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ timeslot: 77
gridarea: "17/3/18/4"
images:
- /images/sessions/2024/bronco.jpg
slides: 2024/66-Bronco.pdf
video:
---

Airflow is not just purpose-built for data applications. It is a job scheduler on steroids. This is exactly what a cloud platform team needs: a configurable and scalable automation tool that can handle thousands of administrative tasks.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,8 @@ timeslot: 78
gridarea: "17/4/18/5"
images:
- /images/sessions/2024/mastering-advanced.jpg
slides: 2024/69-Mastering-Advanced-Dataset-Scheduling-in-Apache-Airflow.pdf
video:
---

Are you looking to harness the full potential of data-driven pipelines with Apache Airflow? This session will dive into the newly introduced conditional expressions for advanced dataset scheduling in Airflow - a feature highly requested by the Airflow community. Attendees will learn how to effectively use logical operators to create complex dependencies that trigger DAGs based on the dataset updates in real-world scenarios. We'll also explore the innovative DatasetOrTimeSchedule, which combines time-based and dataset-triggered scheduling for unparalleled flexibility. Furthermore, attendees will discover the latest API endpoints that facilitate external updates and resets of dataset events, streamlining workflow management across different deployments.
Expand Down
Loading

0 comments on commit 3f7e3f1

Please sign in to comment.