Skip to content

Commit

Permalink
docs: Explain zero-copy branching and use the term throughout the docs (
Browse files Browse the repository at this point in the history
  • Loading branch information
talSofer authored Jan 4, 2024
1 parent 15d3399 commit b091577
Show file tree
Hide file tree
Showing 5 changed files with 12 additions and 10 deletions.
12 changes: 6 additions & 6 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,10 +18,10 @@ With lakeFS, you can use concepts on your data lake such as **branch** to create

## How Do I Get Started?

**[The hands-on quickstart](./quickstart) guides you through some of core features of lakeFS**.
**[The hands-on quickstart](./quickstart) guides you through some core features of lakeFS**.

These include [branching](./quickstart/branch.html), [merging](quickstart/commit-and-merge.html), and [rolling back changes](quickstart/rollback.html) to data.

{: .note}
You can use the [30-day free trial of lakeFS Cloud](https://lakefs.cloud/register) if you want to try out lakeFS without installing anything.

Expand All @@ -30,7 +30,7 @@ You can use the [30-day free trial of lakeFS Cloud](https://lakefs.cloud/registe
* It is format-agnostic.
* It works with numerous data tools and platforms.
* Your data stays in place.
* It minimizes data duplication via a copy-on-write mechanism.
* It eliminates the need for data duplication using [zero-copy branching](understand/model.md#zero-copy-branching).
* It maintains high performance over data lakes of any size.
* It includes configurable garbage collection capabilities.
* It is proven in production and has an active community.
Expand Down Expand Up @@ -108,9 +108,9 @@ Following this pattern, lakeFS facilitates a streamlined data deployment workflo

lakeFS helps you maintain a tidy data lake in several ways, including:

### Isolated Dev/Test Environments with copy-on-write
### Isolated Dev/Test Environments with zero-copy branching

lakeFS makes creating isolated dev/test environments for ETL testing instantaneous, and through its use of copy-on-write, cheap. This enables you to test and validate code changes on production data without impacting it, as well as run analysis and experiments on production data in an isolated clone.
lakeFS makes creating isolated dev/test environments for ETL testing instantaneous, and through its use of zero-copy branching, cheap. This enables you to test and validate code changes on production data without impacting it, as well as run analysis and experiments on production data in an isolated clone.

👉🏻 [Read more](./understand/use_cases/etl_testing.html)

Expand All @@ -122,7 +122,7 @@ Being able to look at data as it was at a given point is particularly useful in

ML experimentation is usually an iterative process, and being able to reproduce a specific iteration is important.

With lakeFS you can version all components of an ML experiment including its data, as well as make use of copy-on-write to minimise the footprint of versions of the data
With lakeFS you can version all components of an ML experiment including its data, as well as make use of zero-copy branching to minimise the footprint of versions of the data

2. Troubleshooting production problems

Expand Down
2 changes: 1 addition & 1 deletion docs/quickstart/branch.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ previous: ["Query the pre-populated data", "./query.html"]

# Create a Branch

lakeFS uses branches in a similar way to Git. It's a great way to isolate changes until, or if, we are ready to re-integrate them. lakeFS uses a copy-on-write technique which means that it's very efficient to create branches of your data.
lakeFS uses branches in a similar way to Git. It's a great way to isolate changes until, or if, we are ready to re-integrate them. lakeFS uses a zero-copy branching technique which means that it's very efficient to create branches of your data.

Having seen the lakes data in the previous step we're now going to create a new dataset to hold data only for lakes in Denmark. Why? Well, because :)

Expand Down
2 changes: 1 addition & 1 deletion docs/understand/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ redirect_from:
lakeFS is completely free, open-source, and licensed under the [Apache 2.0](https://www.apache.org/licenses/LICENSE-2.0) License. We maintain a public [product roadmap][roadmap] and a [Slack channel](https://lakefs.io/slack) for open discussions.

### 2. How does lakeFS data versioning work?
lakeFS uses a copy-on-write mechanism to avoid data duplication. For example, creating a new branch is a metadata-only operation: no objects are actually copied. Only when an object changes does lakeFS create another [version of the data](https://lakefs.io/blog/data-versioning/) in the storage. For more information, see [Versioning internals]({% link understand/how/versioning-internals.md %}).
lakeFS uses zero-copy branching to avoid data duplication. That is, creating a new branch is a metadata-only operation: no objects are actually copied. Only when an object changes does lakeFS create another [version of the data](https://lakefs.io/blog/data-versioning/) in the storage. For more information, see [Versioning internals]({% link understand/how/versioning-internals.md %}).

### 3. How do I get support for my lakeFS installation?
We are extremely responsive on our Slack channel, and we make sure to prioritize the most pressing issues for the community. For SLA-based support, please contact us at [[email protected]](mailto:[email protected]).
Expand Down
4 changes: 3 additions & 1 deletion docs/understand/model.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,8 +61,10 @@ Branches in lakeFS allow users to create their own "isolated" view of the reposi

Changes on one branch do not appear on other branches. Users can take changes from one branch and apply it to another by [merging](#merge) them.

Under the hood, branches are simply a pointer to a [commit](#commits) along with a set of uncommitted changes.
#### Zero-copy branching

Under the hood, branches are simply a pointer to a [commit](#commits) along with a set of uncommitted changes.
Creating a branch is a **zero-copy operation**; instead of duplicating data, it involves creating a pointer to the source commit for the branch.

### Tags

Expand Down
2 changes: 1 addition & 1 deletion docs/understand/use_cases/etl_testing.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,7 @@ Without lakeFS, the challenge with this approach is that it can be time-consumin

## How does lakeFS help with Dev/Test environments?

lakeFS makes creating isolated dev/test environments for ETL testing quick and cheap. lakeFS uses Copy-on-Write which means that there is no duplication of data when you create a new environment. This frees you from spending time on environment maintenance and makes it possible to create as many environments as needed.
lakeFS makes creating isolated dev/test environments for ETL testing quick and cheap. lakeFS uses zero-copy branching which means that there is no duplication of data when you create a new environment. This frees you from spending time on environment maintenance and makes it possible to create as many environments as needed.

In a lakeFS repository, data is always located on a `branch`. You can think of each `branch` in lakeFS as its own environment. This is because branches are isolated, meaning changes on one branch have no effect other branches.

Expand Down

0 comments on commit b091577

Please sign in to comment.