diff --git a/docs/content/concepts/spec/fileindex.md b/docs/content/concepts/spec/fileindex.md new file mode 100644 index 000000000000..6a8169aefee3 --- /dev/null +++ b/docs/content/concepts/spec/fileindex.md @@ -0,0 +1,138 @@ +--- +title: "File Index" +weight: 7 +type: docs +aliases: +- /concepts/spec/fileindex.html +--- + + +# File index + +Define `file-index.${index_type}.columns`, Paimon will create its corresponding index file for each file. If the index +file is too small, it will be stored directly in the manifest, or in the directory of the data file. Each data file +corresponds to an index file, which has a separate file definition and can contain different types of indexes with +multiple columns. + +## Index File + +File index file format. Put all column and offset in the header. + +
+  _____________________________________    _____________________
+|     magic    |version|head length |
+|-------------------------------------|
+|            column number            |
+|-------------------------------------|
+|   column 1        | index number   |
+|-------------------------------------|
+|  index name 1 |start pos |length  |
+|-------------------------------------|
+|  index name 2 |start pos |length  |
+|-------------------------------------|
+|  index name 3 |start pos |length  |
+|-------------------------------------|            HEAD
+|   column 2        | index number   |
+|-------------------------------------|
+|  index name 1 |start pos |length  |
+|-------------------------------------|
+|  index name 2 |start pos |length  |
+|-------------------------------------|
+|  index name 3 |start pos |length  |
+|-------------------------------------|
+|                 ...                 |
+|-------------------------------------|
+|                 ...                 |
+|-------------------------------------|
+|  redundant length |redundant bytes |
+|-------------------------------------|    ---------------------
+|                BODY                 |
+|                BODY                 |
+|                BODY                 |             BODY
+|                BODY                 |
+|_____________________________________|    _____________________
+*
+magic:                            8 bytes long, value is 1493475289347502L, BIT_ENDIAN
+version:                          4 bytes int, BIT_ENDIAN
+head length:                      4 bytes int, BIT_ENDIAN
+column number:                    4 bytes int, BIT_ENDIAN
+column x name:                    2 bytes short BIT_ENDIAN and Java modified-utf-8
+index number:                     4 bytes int (how many column items below), BIT_ENDIAN
+index name x:                     2 bytes short BIT_ENDIAN and Java modified-utf-8
+start pos:                        4 bytes int, BIT_ENDIAN
+length:                           4 bytes int, BIT_ENDIAN
+redundant length:                 4 bytes int (for compatibility with later versions, in this version, content is zero)
+redundant bytes:                  var bytes (for compatibility with later version, in this version, is empty)
+BODY:                             column index bytes + column index bytes + column index bytes + .......
+
+ +## Column Index Bytes: BloomFilter + +Define `'file-index.bloom-filter.columns'`. + +Content of bloom filter index is simple: +- numHashFunctions 4 bytes int, BIT_ENDIAN +- bloom filter bytes + +This class use (64-bits) long hash. Store the num hash function (one integer) and bit set bytes only. Hash bytes type +(like varchar, binary, etc.) using xx hash, hash numeric type by [specified number hash](http://web.archive.org/web/20071223173210/http://www.concentric.net/~Ttwang/tech/inthash.htm). + +## Column Index Bytes: Bitmap + +Define `'file-index.bitmap.columns'`. + +Bitmap file index format (V1): + +
+Bitmap file index format (V1)
++-------------------------------------------------+-----------------
+| version (1 byte)                               |
++-------------------------------------------------+
+| row count (4 bytes int)                        |
++-------------------------------------------------+
+| non-null value bitmap number (4 bytes int)     |
++-------------------------------------------------+
+| has null value (1 byte)                        |
++-------------------------------------------------+
+| null value offset (4 bytes if has null value)  |       HEAD
++-------------------------------------------------+
+| value 1 | offset 1                             |
++-------------------------------------------------+
+| value 2 | offset 2                             |
++-------------------------------------------------+
+| value 3 | offset 3                             |
++-------------------------------------------------+
+| ...                                            |
++-------------------------------------------------+-----------------
+| serialized bitmap 1                            |
++-------------------------------------------------+
+| serialized bitmap 2                            |
++-------------------------------------------------+       BODY
+| serialized bitmap 3                            |
++-------------------------------------------------+
+| ...                                            |
++-------------------------------------------------+-----------------
+*
+value x:                       var bytes for any data type (as bitmap identifier)
+offset:                        4 bytes int (when it is negative, it represents that there is only one value
+                                 and its position is the inverse of the negative value)
+
+ +Integer are all BIT_ENDIAN. diff --git a/docs/content/concepts/spec/indexfile.md b/docs/content/concepts/spec/indexfile.md deleted file mode 100644 index cfcbcade9a43..000000000000 --- a/docs/content/concepts/spec/indexfile.md +++ /dev/null @@ -1,42 +0,0 @@ ---- -title: "IndexFile" -weight: 6 -type: docs -aliases: -- /concepts/spec/indexfile.html ---- - - -# IndexFile - -## Global Index - -Global Index is in the index directory, currently, only two places will use global index: - -1. bucket = -1 + primary key table: in dynamic bucket mode, the index records the correspondence between the hash value - of the primary-key and the bucket, each bucket has an index file. -2. Deletion Vectors: index stores the deletion file, and each bucket has a deletion file. - -## Data File Index - -Define `file-index.bloom-filter.columns`, Paimon will create its corresponding index file for each file. If the index -file is too small, it will be stored directly in the manifest, or in the directory of the data file. Each data file -corresponds to an index file, which has a separate file definition and can contain different types of indexes with -multiple columns. diff --git a/docs/content/concepts/spec/snapshot.md b/docs/content/concepts/spec/snapshot.md index 5c0f58ac44b0..d1059827243b 100644 --- a/docs/content/concepts/spec/snapshot.md +++ b/docs/content/concepts/spec/snapshot.md @@ -53,7 +53,7 @@ Snapshot File is JSON, it includes: 4. baseManifestList: a manifest list recording all changes from the previous snapshots. 5. deltaManifestList: a manifest list recording all new changes occurred in this snapshot. 6. changelogManifestList: a manifest list recording all changelog produced in this snapshot, null if no changelog is produced. -7. indexManifest: a manifest recording all index files of this table, null if no index file. +7. indexManifest: a manifest recording all index files of this table, null if no table index file. 8. commitUser: usually generated by UUID, it is used for recovery of streaming writes, one stream write job with one user. 9. commitIdentifier: transaction id corresponding to streaming write, each transaction may result in multiple commits for different commitKinds. 10. commitKind: type of changes in this snapshot, including append, compact, overwrite and analyze. diff --git a/docs/content/concepts/spec/tableindex.md b/docs/content/concepts/spec/tableindex.md new file mode 100644 index 000000000000..e88f9e6d3bf2 --- /dev/null +++ b/docs/content/concepts/spec/tableindex.md @@ -0,0 +1,57 @@ +--- +title: "Table Index" +weight: 6 +type: docs +aliases: +- /concepts/spec/tableindex.html +--- + + +# Table index + +Table Index files is in the `index` directory. + +## Dynamic Bucket Index + +Dynamic bucket index is used to store the correspondence between the hash value of the primary-key and the bucket. + +Its structure is very simple, only storing hash values in the file: + +HASH_VALUE | HASH_VALUE | HASH_VALUE | HASH_VALUE | ... + +HASH_VALUE is the hash value of the primary-key. 4 bytes, BIT_ENDIAN. + +## Deletion Vectors + +Deletion file is used to store the deleted records position for each data file. Each bucket has one deletion file for +primary key table. + +{{< img src="/img/deletion-file.png">}} + +The deletion file is a binary file, and the format is as follows: + +- First, record version by a byte. Current version is 1. +- Then, record in sequence. +- Size and checksum are BIT_ENDIAN Integer. + +For each serialized bin: + +- First, record a const magic number by an int (BIT_ENDIAN). Current the magic number is 1581511376. +- Then, record serialized bitmap. Which is a [RoaringBitmap](https://github.com/RoaringBitmap/RoaringBitmap) (org.roaringbitmap.RoaringBitmap). diff --git a/docs/static/img/deletion-file.png b/docs/static/img/deletion-file.png new file mode 100644 index 000000000000..e66aa43618d6 Binary files /dev/null and b/docs/static/img/deletion-file.png differ diff --git a/paimon-common/src/main/java/org/apache/paimon/fileindex/bloomfilter/BloomFilterFileIndex.java b/paimon-common/src/main/java/org/apache/paimon/fileindex/bloomfilter/BloomFilterFileIndex.java index ce7827a98429..3c9dcadba3ec 100644 --- a/paimon-common/src/main/java/org/apache/paimon/fileindex/bloomfilter/BloomFilterFileIndex.java +++ b/paimon-common/src/main/java/org/apache/paimon/fileindex/bloomfilter/BloomFilterFileIndex.java @@ -101,7 +101,7 @@ public void write(Object key) { public byte[] serializedBytes() { int numHashFunctions = filter.getNumHashFunctions(); byte[] serialized = new byte[filter.getBitSet().bitSize() / Byte.SIZE + Integer.BYTES]; - // little endian + // big endian serialized[0] = (byte) ((numHashFunctions >>> 24) & 0xFF); serialized[1] = (byte) ((numHashFunctions >>> 16) & 0xFF); serialized[2] = (byte) ((numHashFunctions >>> 8) & 0xFF);