Skip to content

Commit

Permalink
support index commands
Browse files Browse the repository at this point in the history
  • Loading branch information
AyushGupta3 authored and tushengxia committed Sep 22, 2020
1 parent 20b66f5 commit e9d4b75
Show file tree
Hide file tree
Showing 53 changed files with 1,807 additions and 966 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -20,19 +20,15 @@ public class IndexCacheKey
{
private String path;
private long lastModifiedTime;
private String[] indexTypes;

/**
*
* @param path path to the file the index files should be read for
* @param lastModifiedTime lastModifiedTime of the file, used to validate the indexes
* @param indexTypes only load specified index types, e.g. bloom
*/
public IndexCacheKey(String path, long lastModifiedTime, String... indexTypes)
public IndexCacheKey(String path, long lastModifiedTime)
{
this.path = path;
this.lastModifiedTime = lastModifiedTime;
this.indexTypes = indexTypes;
}

public String getPath()
Expand Down Expand Up @@ -73,9 +69,4 @@ public int hashCode()
{
return Objects.hash(path);
}

public String[] getIndexTypes()
{
return indexTypes;
}
}
73 changes: 25 additions & 48 deletions hetu-docs/en/indexer/indexer-cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,70 +3,46 @@

## Usage

The index executable will be located under the `bin` directory in the installation.

For example, `<path to installtion directory>/bin/index` and must be executed from the `bin` directory because it uses relative paths by default.
The indexer can be utilized using the hetu-cli executable located under the `bin` directory in the installation.

To create an index you can run sql queries of the form:
```roomsql
CREATE INDEX [ IF NOT EXISTS ] index_name
USING [ BITMAP | BLOOM | MINMAX ]
ON tbl_name (col_name)
WITH ( "bloom.fpp" = '0.001', [, …] )
WHERE predicate;
```

To show all indexes or a specific index_name:
```roomsql
SHOW INDEX;
SHOW INDEX index_name;
```
Usage: index [-v] [--debug] [--disableLocking] --table=<table>
[-c=<configDirPath>] [--column=<columns>[,<columns>...]]...
[--partition=<partitions>[,<partitions>...]]...
[--type=<indexTypes>[,<indexTypes>...]]...
[-I=<indexproperties>[,<indexproperties>...]]<command>
Using this index tool, you can CREATE, SHOW and DELETE indexes.
Supported index types: BITMAP, BLOOM, MINMAX
Supported data sources: HIVE using ORC files (must be configured in {--config}/catalog/catalog_name.properties
<command> command types, e.g. create, delete, show; Note: delete command
works a column level only.
--column=<columns>[,<columns>...]
column, comma separated format for multiple columns
--debug if enabled the original data for each split will
also be written to a file alongside the index
--disableLocking by default locking is enabled at the table level; if this
is set to false, the user must ensure that the same data
is not indexed by multiple callers at the same time
(indexing different columns or partitions in parallel is
allowed)
--partition=<partitions>[,<partitions>...]
only create index for these partitions, comma separated
format for multiple partitions
--table=<table> fully qualified table name
--type=<indexTypes>[,<indexTypes>...]
index type, comma separated format for multiple types
(supported types: BLOOM, BITMAP, MINMAX
-c, --config=<configDirPath>
root folder of openLooKeng etc directory (default: ../etc)
-p, --plugins=<plugins>[,<plugins>...]
plugins dir or file, defaults to (default: .
/hetu-heuristic-index/plugins)
-v verbose

To delete an index by name:
```roomsql
DROP INDEX index_name;
```

## Examples

### Create index

``` shell
$ ./index -v -c ../etc --table hive.schema.table --column column1,column2 --type bloom,minmax,bitmap --partition p=part1 create
$ ./hetu-cli --config ../etc --execute 'CREATE INDEX index_name USING bloom ON hive.schema.table (column1) WITH ("bloom.fpp"="0.01", verbose=true) WHERE p=part1'
```

### Show index

``` shell
$ ./index -v -c ../etc --table hive.schema.table show
$ ./hetu-cli --config ../etc --execute "SHOW INDEX index_name"
```

### Delete index

*Note:* index can only be deleted at table or column level, i.e. all index types will be deleted

``` shell
$ ./index -v -c ../etc --table hive.schema.table --column column1 delete
$ ./hetu-cli --config ../etc --execute "DELETE INDEX index_name"
```

## Notes on resource usage
Expand All @@ -76,6 +52,7 @@ $ ./index -v -c ../etc --table hive.schema.table --column column1 delete
By default the default JVM MaxHeapSize will be used (`java -XX:+PrintFlagsFinal -version | grep MaxHeapSize`). For improved performance, it is recommended to increase the MaxHeapSize. This can be
done by setting -Xmx value:

*Note*: This should be done before executing the hetu-cli.
``` shell
export JAVA_TOOL_OPTIONS="-Xmx100G"
```
Expand All @@ -84,16 +61,16 @@ In this example the MaxHeapSize will be set to 100G.

### Indexing in parallel

If creating the index for a large table is too slow on one machine, you can create index for different partitions in parallel on different machines. This requires setting the --disableLocking flag and specifying the partition(s). For example:
If creating the index for a large table is too slow on one machine, you can create an index for different partitions in parallel on different machines. This requires setting the parallelCreation property to true and specifying the partition(s). For example:

On machine 1:

``` bash
$ ./index -v ---disableLocking c ../etc --table hive.schema.table --columncolumn1,column2 --type bloom,minmax,bitmap --partition p=part1 create
$ ./hetu-cli --config ../etc --execute 'CREATE INDEX index_name USING bloom ON hive.schema.table (column1) WITH ("bloom.fpp"="0.01", parallelCreation=true) WHERE p=part1'
```

On machine 2:

``` shell
$ ./index -v ---disableLocking c ../etc --table hive.schema.table --columncolumn1,column2 --type bloom,minmax,bitmap --partition p=part2 create
$ ./hetu-cli --config ../etc --execute 'CREATE INDEX index_name USING bloom ON hive.schema.table (column1) WITH ("bloom.fpp"="0.01", parallelCreation=true) WHERE p=part2'
```
2 changes: 1 addition & 1 deletion hetu-docs/en/indexer/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ Note that you may create multiple filesystem profiles in `etc/filesystem`, sever

To write index to the indexstore specified above, just change directory to your hetu installation's `bin` folder, then run:

./index -c <your-etc-folder-directory> --table table1 --column id --type bloom create
./hetu-cli --config <your-etc-folder-directory> --execute 'CREATE INDEX index_name USING bloom ON table1 (column)'

### Run query

Expand Down
69 changes: 25 additions & 44 deletions hetu-docs/zh/indexer/indexer-cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,66 +3,47 @@

## 用法

可执行索引文件在安装中位于`bin`目录下,例如`<安装路径>/bin/index`。因为默认使用相对路径,它必须在`bin`目录下运行。
索引命令行接口集成于`hetu-cli`中, 在安装目录的`bin`目录下运行。

命令的使用方式如下:
```roomsql
CREATE INDEX [ IF NOT EXISTS ] index_name
USING [ BITMAP | BLOOM | MINMAX ]
ON tbl_name (col_name)
WITH ( "bloom.fpp" = '0.001', [, …] )
WHERE predicate;
```

To show all indexes or a specific index_name:
```roomsql
SHOW INDEX;
SHOW INDEX index_name;
```
使用方法:index [-v] [--debug] [--disableLocking] --table=<table>
[-c=<configDirPath>] [--column=<columns>[,<columns>...]]...
[--partition=<partitions>[,<partitions>...]]...
[--type=<indexTypes>[,<indexTypes>...]]...
[-I=<indexproperties>[,<indexproperties>...]]<command>
使用此索引工具,您可以创建、显示和删除索引。
支持的索引类型如下:BITMAP, BLOOM, MINMAX
支持的索引存储:LOCAL, HDFS (必须 在{--config}/config.properties配置
支持的数据源:HIVE using ORC files (必须在{--config}/catalog/catalog_name.properties配置
<command> 命令类型,如create、delete、show等;说明:delete命令只作用于列级。
--column=<columns>[,<columns>...]
列,使用逗号分隔多个列
--debug 如果启用,则每个Split的原始数据也将随索引一起写入文件
--disableLocking 默认锁定在表级别启用;如果设置为false,用户必须确保相同的数据没有被多个调用方同时索引(允许不同列或分区并行索引)
--partition=<partitions>[,<partitions>...]
只为这些分区创建索引,用逗号分隔多个分区
--table=<table> 全量表名
--type=<indexTypes>[,<indexTypes>...]
索引类型,用逗号分隔多种类型 (支持的类型:BLOOM, BITMAP, MINMAX
-c, --config=<configDirPath>
openLooKeng etc目录的根目录(默认为../etc)
-p, --plugins=<plugins>[,<plugins>...]
plugins目录或文件(默认为. /hetu-heuristic-index/plugins)
-v verbose

To delete an index by name:
```roomsql
DROP INDEX index_name;
```

## 示例

### 创建索引

``` shell
$ ./index -v -c ../etc --table hive.schema.table --column column1,column2 --type bloom,minmax,bitmap --partition p=part1 create
$ ./hetu-cli --config ../etc --execute 'CREATE INDEX index_name USING bloom ON hive.schema.table (column1) WITH ("bloom.fpp"="0.01", debugEnabled=true) WHERE p=part1'
```

### 显示索引

``` shell
$ ./index -v -c ../etc --table hive.schema.table show
$ ./hetu-cli --config ../etc --execute "SHOW INDEX index_name"
$ ./hetu-cli --config ../etc --execute "SHOW INDEX"
```

### 删除索引

*注意:* 索引只能在表或列级别删除,即所有索引类型都将被删除。

``` shell
$ ./index -v -c ../etc --table hive.schema.table --column column1 delete
$ ./hetu-cli --config ../etc --execute "DELETE INDEX index_name"
```

## 资源使用说明
Expand All @@ -80,16 +61,16 @@ export JAVA_TOOL_OPTIONS="-Xmx100G"

### 并行索引

如果在一台机器上为一个大表创建索引的速度太慢,则可以在不同的机器上并行为不同的分区创建索引。这需要设置--disableLocking标志并指定分区。例如:
如果在一台机器上为一个大表创建索引的速度太慢,则可以在不同的机器上并行为不同的分区创建索引。这需要设置parallelCreation标志并指定分区。例如:

在机器1上:

``` bash
$ ./index -v ---disableLocking c ../etc --table hive.schema.table --columncolumn1,column2 --type bloom,minmax,bitmap --partition p=part1 create
$ ./hetu-cli --config ../etc --execute 'CREATE INDEX index_name USING bloom ON hive.schema.table (column1) WITH ("bloom.fpp"="0.01", parallelCreation=true) WHERE p=part1'
```

在机器2上:

``` shell
$ ./index -v ---disableLocking c ../etc --table hive.schema.table --columncolumn1,column2 --type bloom,minmax,bitmap --partition p=part2 create
$ ./hetu-cli --config ../etc --execute 'CREATE INDEX index_name USING bloom ON hive.schema.table (column1) WITH ("bloom.fpp"="0.01", parallelCreation=true) WHERE p=part2'
```
2 changes: 1 addition & 1 deletion hetu-docs/zh/indexer/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@

要创建索引,首先将工作目录cd至安装目录的`bin`文件夹,然后运行:

./index -c <your-etc-folder-directory> --table table1 --column id --type bloom create
./hetu-cli --config <your-etc-folder-directory> --execute 'CREATE INDEX index_name USING bloom ON table1 (column)'

### 运行语句

Expand Down
Loading

0 comments on commit e9d4b75

Please sign in to comment.