Skip to content

Commit

Permalink
[doc] Add Spec of Index Manifest
Browse files Browse the repository at this point in the history
  • Loading branch information
JingsongLi committed Aug 16, 2024
1 parent 6646a4f commit c8ff93f
Showing 1 changed file with 41 additions and 1 deletion.
42 changes: 41 additions & 1 deletion docs/content/concepts/spec/manifest.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,11 @@ under the License.

## Manifest List

```shell
├── manifest
└── manifest-list-51c16f7b-421c-4bc0-80a0-17677f343358-1
```

Manifest List includes meta of several manifest files. Its name contains UUID, it is a avro file, the schema is:

1. fileName: manifest file name.
Expand All @@ -40,12 +45,22 @@ Manifest List includes meta of several manifest files. Its name contains UUID, i

## Manifest

Manifest includes meta of several data files or changelog files. Its name contains UUID, it is a avro file.
Manifest includes meta of several data files or changelog files or table-index files. Its name contains UUID, it is an
avro file.

The changes of the file are saved in the manifest, and the file can be added or deleted. Manifests should be in
an orderly manner, and the same file may be added or deleted multiple times. The last version should be read. This
design can make commit lighter to support file deletion generated by compaction.

### Data Manifest

Data Manifest includes meta of several data files or changelog files.

```shell
├── manifest
└── manifest-6758823b-2010-4d06-aef0-3b1b597723d6-0
```

The schema is:

1. kind: ADD or DELETE,
Expand All @@ -71,3 +86,28 @@ The data file meta is:
13. creationTime: creation time of this file.
14. deleteRowCount: rowCount = addRowCount + deleteRowCount.
15. embeddedIndex: if data file index is too small, store the index in manifest.

### Index Manifest

Index Manifest includes meta of several [table-index]({{< ref "concepts/spec/tableindex" >}}) files.

```shell
├── manifest
└── index-manifest-5d670043-da25-4265-9a26-e31affc98039-0
```

The schema is:

1. kind: ADD or DELETE,
2. partition: partition spec, a BinaryRow.
3. bucket: bucket of this file.
4. indexFile: index file meta.

The index file meta is:

1. indexType: string, "HASH" or "DELETION_VECTORS".
2. fileName: file name.
3. fileSize: file size.
4. rowCount: total number of rows.
5. deletionVectorsRanges: Metadata only used by "DELETION_VECTORS", Stores offset and length of each data file,
The schema is `ARRAY<ROW<f0: STRING, f1: INT, f2: INT>>`.

0 comments on commit c8ff93f

Please sign in to comment.