-
Notifications
You must be signed in to change notification settings - Fork 360
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Co-authored-by: guyhardonag <[email protected]>
- Loading branch information
Showing
1 changed file
with
35 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,10 +1,43 @@ | ||
--- | ||
layout: default | ||
title: Airflow | ||
description: Querying data in lakeFS from Presto is the same as querying data in S3 from Presto. It is done using the Hive connector. | ||
description: Easily build reproducible data pipelines with Airflow and lakeFS using commits, without modifying the code or logic of your job. | ||
parent: Using lakeFS with... | ||
nav_order: 10 | ||
has_children: false | ||
--- | ||
|
||
# Using lakeFS with Airflow | ||
# Using lakeFS with Airflow | ||
|
||
|
||
There are two aspects we will need to handle in order to run Airflow with lakeFS: | ||
|
||
## Access and Insert data through lakeFS | ||
Since lakeFS supports AWS S3 API, it works seamlessly with all operators that work on top of S3 (such as SparkSubmitOperator, S3FileTransormOperator, etc.) | ||
|
||
All we need to do is set lakeFS as the endpoint-url and use our lakeFS credentials instead of our S3 credentials and that’s about it. | ||
|
||
We could then run tasks on lakeFS using the lakeFS path convention | ||
|
||
```s3://[REPOSITORY]/[BRANCH]/PATH/TO/OBJECT``` | ||
|
||
The lakeFS docs contain explanations and examples on how to use lakeFS with [AWS CLI](aws_cli.md), [Spark](spark.md), [Presto](presto.md) and many more. | ||
|
||
## Run lakeFS commands such as creating branches, committing, merging, etc. | ||
We currently have two options to run lakeFS commands with Airflow | ||
Using the SimpleHttpOperator to send [API requests](../reference/api.md) to lakeFS. Or we could use the bashOperator with [lakeCTL](../quickstart/lakefs_cli.md) commands. | ||
|
||
For example, a commit task using the bashOperator: | ||
|
||
```python | ||
|
||
commit_extract = BashOperator( | ||
task_id='commit_extract', | ||
bash_command='lakectl commit lakefs://example_repo@example_dag_branch -m "extract data"', | ||
dag=dag, | ||
) | ||
``` | ||
|
||
|
||
|
||
|