Skip to content

Commit

Permalink
[doc] Document Spec: table index and file index
Browse files Browse the repository at this point in the history
  • Loading branch information
JingsongLi committed Aug 7, 2024
1 parent c11b950 commit b118e63
Show file tree
Hide file tree
Showing 6 changed files with 197 additions and 44 deletions.
138 changes: 138 additions & 0 deletions docs/content/concepts/spec/fileindex.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
---
title: "File Index"
weight: 7
type: docs
aliases:
- /concepts/spec/fileindex.html
---
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# File index

Define `file-index.${index_type}.columns`, Paimon will create its corresponding index file for each file. If the index
file is too small, it will be stored directly in the manifest, or in the directory of the data file. Each data file
corresponds to an index file, which has a separate file definition and can contain different types of indexes with
multiple columns.

## Index File

File index file format. Put all column and offset in the header.

<pre>
_____________________________________ _____________________
| magic |version|head length |
|-------------------------------------|
| column number |
|-------------------------------------|
| column 1 | index number |
|-------------------------------------|
| index name 1 |start pos |length |
|-------------------------------------|
| index name 2 |start pos |length |
|-------------------------------------|
| index name 3 |start pos |length |
|-------------------------------------| HEAD
| column 2 | index number |
|-------------------------------------|
| index name 1 |start pos |length |
|-------------------------------------|
| index name 2 |start pos |length |
|-------------------------------------|
| index name 3 |start pos |length |
|-------------------------------------|
| ... |
|-------------------------------------|
| ... |
|-------------------------------------|
| redundant length |redundant bytes |
|-------------------------------------| ---------------------
| BODY |
| BODY |
| BODY | BODY
| BODY |
|_____________________________________| _____________________
*
magic: 8 bytes long, value is 1493475289347502L, BIT_ENDIAN
version: 4 bytes int, BIT_ENDIAN
head length: 4 bytes int, BIT_ENDIAN
column number: 4 bytes int, BIT_ENDIAN
column x name: 2 bytes short BIT_ENDIAN and Java modified-utf-8
index number: 4 bytes int (how many column items below), BIT_ENDIAN
index name x: 2 bytes short BIT_ENDIAN and Java modified-utf-8
start pos: 4 bytes int, BIT_ENDIAN
length: 4 bytes int, BIT_ENDIAN
redundant length: 4 bytes int (for compatibility with later versions, in this version, content is zero)
redundant bytes: var bytes (for compatibility with later version, in this version, is empty)
BODY: column index bytes + column index bytes + column index bytes + .......
</pre>

## Column Index Bytes: BloomFilter

Define `'file-index.bloom-filter.columns'`.

Content of bloom filter index is simple:
- numHashFunctions 4 bytes int, BIT_ENDIAN
- bloom filter bytes

This class use (64-bits) long hash. Store the num hash function (one integer) and bit set bytes only. Hash bytes type
(like varchar, binary, etc.) using xx hash, hash numeric type by [specified number hash](http://web.archive.org/web/20071223173210/http://www.concentric.net/~Ttwang/tech/inthash.htm).

## Column Index Bytes: Bitmap

Define `'file-index.bitmap.columns'`.

Bitmap file index format (V1):

<pre>
Bitmap file index format (V1)
+-------------------------------------------------+-----------------
| version (1 byte) |
+-------------------------------------------------+
| row count (4 bytes int) |
+-------------------------------------------------+
| non-null value bitmap number (4 bytes int) |
+-------------------------------------------------+
| has null value (1 byte) |
+-------------------------------------------------+
| null value offset (4 bytes if has null value) | HEAD
+-------------------------------------------------+
| value 1 | offset 1 |
+-------------------------------------------------+
| value 2 | offset 2 |
+-------------------------------------------------+
| value 3 | offset 3 |
+-------------------------------------------------+
| ... |
+-------------------------------------------------+-----------------
| serialized bitmap 1 |
+-------------------------------------------------+
| serialized bitmap 2 |
+-------------------------------------------------+ BODY
| serialized bitmap 3 |
+-------------------------------------------------+
| ... |
+-------------------------------------------------+-----------------
*
value x: var bytes for any data type (as bitmap identifier)
offset: 4 bytes int (when it is negative, it represents that there is only one value
and its position is the inverse of the negative value)
</pre>

Integer are all BIT_ENDIAN.
42 changes: 0 additions & 42 deletions docs/content/concepts/spec/indexfile.md

This file was deleted.

2 changes: 1 addition & 1 deletion docs/content/concepts/spec/snapshot.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ Snapshot File is JSON, it includes:
4. baseManifestList: a manifest list recording all changes from the previous snapshots.
5. deltaManifestList: a manifest list recording all new changes occurred in this snapshot.
6. changelogManifestList: a manifest list recording all changelog produced in this snapshot, null if no changelog is produced.
7. indexManifest: a manifest recording all index files of this table, null if no index file.
7. indexManifest: a manifest recording all index files of this table, null if no table index file.
8. commitUser: usually generated by UUID, it is used for recovery of streaming writes, one stream write job with one user.
9. commitIdentifier: transaction id corresponding to streaming write, each transaction may result in multiple commits for different commitKinds.
10. commitKind: type of changes in this snapshot, including append, compact, overwrite and analyze.
Expand Down
57 changes: 57 additions & 0 deletions docs/content/concepts/spec/tableindex.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
---
title: "Table Index"
weight: 6
type: docs
aliases:
- /concepts/spec/tableindex.html
---
<!--
Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
-->

# Table index

Table Index files is in the `index` directory.

## Dynamic Bucket Index

Dynamic bucket index is used to store the correspondence between the hash value of the primary-key and the bucket.

Its structure is very simple, only storing hash values in the file:

HASH_VALUE | HASH_VALUE | HASH_VALUE | HASH_VALUE | ...

HASH_VALUE is the hash value of the primary-key. 4 bytes, BIT_ENDIAN.

## Deletion Vectors

Deletion file is used to store the deleted records position for each data file. Each bucket has one deletion file for
primary key table.

{{< img src="/img/deletion-file.png">}}

The deletion file is a binary file, and the format is as follows:

- First, record version by a byte. Current version is 1.
- Then, record <size of serialized bin, serialized bin, checksum of serialized bin> in sequence.
- Size and checksum are BIT_ENDIAN Integer.

For each serialized bin:

- First, record a const magic number by an int (BIT_ENDIAN). Current the magic number is 1581511376.
- Then, record serialized bitmap. Which is a [RoaringBitmap](https://github.com/RoaringBitmap/RoaringBitmap) (org.roaringbitmap.RoaringBitmap).
Binary file added docs/static/img/deletion-file.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ public void write(Object key) {
public byte[] serializedBytes() {
int numHashFunctions = filter.getNumHashFunctions();
byte[] serialized = new byte[filter.getBitSet().bitSize() / Byte.SIZE + Integer.BYTES];
// little endian
// big endian
serialized[0] = (byte) ((numHashFunctions >>> 24) & 0xFF);
serialized[1] = (byte) ((numHashFunctions >>> 16) & 0xFF);
serialized[2] = (byte) ((numHashFunctions >>> 8) & 0xFF);
Expand Down

0 comments on commit b118e63

Please sign in to comment.