华为云云数据库GaussDBMerge Append_云淘科技

云数据库 GaussDB

11 月 23, 2023

96 0

场景描述

当对分区表进行全局排序时，通常SQL引擎的实现方式是先通过Partition Iterator + PartitionScan对分区表做全量扫描然后进行Sort排序操作，这样难以利用数据分区分治的算法思想进行全局排序，假如ORDER BY排序列包含本身就有序的索引，本身局部有序的前提条件则无法利用。针对这类问题，目前分区表支持了分区归并排序执行策略，利用MergeAppend的执行机制改进分区表的排序机制。

示例

分区表MergeAppend的执行机制示例如下：

CREATE TABLE test_range_pt (a INT, b INT, c INT)
PARTITION BY RANGE(a)
(
    PARTITION p1 VALUES LESS THAN (2000),
    PARTITION p2 VALUES LESS THAN (3000),
    PARTITION p3 VALUES LESS THAN (4000),
    PARTITION p4 VALUES LESS THAN (5000),
    PARTITION p5 VALUES LESS THAN (MAXVALUE)
)ENABLE ROW MOVEMENT;
INSERT INTO test_range_pt VALUES (generate_series(1,10000),generate_series(1,10000),generate_series(1,10000));
CREATE INDEX idx_range_b ON test_range_pt(b) LOCAL;
ANALYZE test_range_pt;

gaussdb=# EXPLAIN ANALYZE SELECT * FROM test_range_pt WHERE b >10 AND b   Result  (cost=0.06..480.32 rows=10 width=12) (actual time=0.988..1.036 rows=10 loops=1)
         ->  Merge Append  (cost=0.06..480.32 rows=10 width=12) (actual time=0.985..1.026 rows=10 loops=1)
               Sort Key: b
               ->  Partitioned Index Scan using idx_range_b on test_range_pt  (cost=0.00..44.61 rows=998 width=12) (actual time=0.256..0.284 rows=10 loops=1)
                     Index Cond: ((b > 10) AND (b   Partitioned Index Scan using idx_range_b on test_range_pt  (cost=0.00..44.61 rows=998 width=12) (actual time=0.208..0.208 rows=1 loops=1)
                     Index Cond: ((b > 10) AND (b   Partitioned Index Scan using idx_range_b on test_range_pt  (cost=0.00..44.61 rows=998 width=12) (actual time=0.205..0.205 rows=1 loops=1)
                     Index Cond: ((b > 10) AND (b   Partitioned Index Scan using idx_range_b on test_range_pt  (cost=0.00..44.61 rows=998 width=12) (actual time=0.212..0.212 rows=1 loops=1)
                     Index Cond: ((b > 10) AND (b   Partitioned Index Scan using idx_range_b on test_range_pt  (cost=0.00..44.61 rows=998 width=12) (actual time=0.092..0.092 rows=0 loops=1)
                     Index Cond: ((b > 10) AND (b 10 AND b   Sort  (cost=296.85..309.33 rows=10 width=12) (actual time=33.555..33.557 rows=10 loops=1)
         Sort Key: b
         Sort Method: top-N heapsort  Memory: 26kB
         ->  Partition Iterator  (cost=0.00..189.00 rows=4991 width=12) (actual time=0.352..27.176 rows=4989 loops=1)
               Iterations: 5
               ->  Partitioned Seq Scan on test_range_pt  (cost=0.00..189.00 rows=4991 width=12) (actual time=16.874..25.637 rows=4989 loops=5)
                     Filter: ((b > 10) AND (b < 5000))
                     Rows Removed by Filter: 5011
                     Selected Partitions:  1..5
 Total runtime: 33.877 ms
(11 rows)

MergeAppend执行方式消耗远小于普通执行方式。

注意事项及约束条件

当分区扫描路径为Index/Index Only时，才支持MergeAppend执行机制。
分区剪枝结果大于1时，才支持MergeAppend执行机制。
当分区索引全部有效且为btree索引时，才支持MergeAppend执行机制。
当SQL含有Limit子句时，才支持MergeAppend执行机制。
当分区扫描时如果存在Filter，不支持MergeAppend执行机制。
当GUC参数sql_beta_feature = ‘disable_merge_append_partition’时，不再生成MergeAppend路径。

父主题： 分区算子执行优化

同意关联代理商云淘科技，购买华为云产品更优惠（QQ 78315851）

内容没看懂？不太想学习？想快速解决？有偿解决：联系专家