doris拆分查询反而能获得三倍的提升 #10535

howardyan93 · 2022-07-01T01:57:54Z

howardyan93
Jul 1, 2022

如下一个表
create table test(
ctime DATETIME,
field1 INT,
field2 STRING,
....
field9 STRING
)
ENGINE=OLAP
DUPLICATE KEY(ctime, fields1') PARTITION BY RANGE(ctime)()
DISTRIBUTED BY HASH(field1)
PROPERTIES
(
....
);

里面保存了20亿数据。
其中field9字段包含某个关键词keyword有个十几条。
如果使用like查询，需要将近6分钟
select * from test where field9 like '%keyword%'。

但是我发现，如果将查询拆分一下，速度反而能加快:
select ctime from test where field9 like '%keyword%'。
结果为t1, t2, t3... tN. 再使用In查询：
select * from test where ctime in (t1, t2, .... tN) and field9 like '%keyword%';
速度会缩短为2分钟。

但是如果使用如下查询：
select * from test where ctime in (
select ctime from test where field9 like '%keyword%'
) and field9 like '%keyword%';

时间还是6分钟左右。

虽然doris是列式存储，查询字段越少速度越快。但是3倍的查询差距仍让人迷惑。
ps: 开启了sql cache和partition cache。

hf200012 · 2022-07-07T10:37:32Z

hf200012
Jul 7, 2022
Collaborator

doris这种类似es的全文检索，我们正在开发，现在你这个case，第一种查询是全量的扫描，然后模糊匹配，效率肯定很低，而且你是读取了所有字段
第二个你只读取了一列数据，这样减少了很多磁盘扫描
第三个sql，你首先使用条件对数据进行了过滤，这样数据集就很小了，在进行模糊匹配效率肯定很高
第四个同理

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

doris拆分查询反而能获得三倍的提升 #10535

{{title}}

Replies: 1 comment

{{title}}

Select a reply

doris拆分查询反而能获得三倍的提升 #10535

howardyan93 Jul 1, 2022

Replies: 1 comment

hf200012 Jul 7, 2022 Collaborator

howardyan93
Jul 1, 2022

hf200012
Jul 7, 2022
Collaborator