Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Properly Use Parallel DML for ETL

It is no secret that for high-performance ETL processes, not only queries but also write operations should be parallelized.
But when you make use of it, is it simply "switch on and forget"? What do you have to consider? Can it also have negative effects?
After a short reminder on how it works (including space management methods), some patterns are presented that have been noticed in several ETL review and tuning projects and help to find the answers to the following questions:
What is the interaction between PDML and partitioning of the target table? Can PDML lead to increased fragmentation of the tablespace? Can you control it? How does the Hint PQ_DISTRIBUTE help? Do indexes on the target table have any influence?

  • Be the first to comment

  • Be the first to like this

Properly Use Parallel DML for ETL

  1. 1. blog.sqlora.com@Andrej_SQL Properly Use Parallel DML for your ETL Andrej Pashchenko
  2. 2. About me • Working at Trivadis, Düsseldorf • Focusing on Oracle: • Data Warehousing • Application Development • Application Performance • Course instructor „Oracle New Features for Developers“ @Andrej_SQL blog.sqlora.com
  3. 3. Parallel Processing in Oracle DB Parallel Processing Parallel Query Parallel DDL Parallel DML SELECT • CTAS • CREATE INDEX • ALTER TABLE MOVE • … • Parallel IAS • Parallel MERGE • Parallel UPDATE • Parallel DELETE
  4. 4. Controlling, Restrictions and Implications
  5. 5. How to enable PDML • Parallel Query and Parallel DDL are enabled by default • Parallel DML has to be enabled first at system or session level: • In 12c it is also possible with a hint at statement level : • Issue with the hint: hard parse on every execution, caution with plan stability • But enabling PDML doesn’t yet mean a parallel execution plan will be used ALTER SESSION ENABLE PARALLEL DML; INSERT /*+ enable_parallel_dml parallel append */ INTO sales SELECT /*+ parallel */ * FROM sales_v;
  6. 6. How do I know PDML was used? • Check the position of DML, e.g. LOAD AS SELECT, with respect to query coordinator • Check the note • Check v$pq_sesstat --------------------------------------------- Operation | Name --------------------------------------------- INSERT STATEMENT | LOAD AS SELECT | T1 PX COORDINATOR | PX SEND QC (RANDOM) | :TQ1000 OPTIMIZER STATISTICS GATHERING | PX BLOCK ITERATOR | TABLE ACCESS FULL | T2 --------------------------------------------- Note - PDML disabled because object is not decorated with parallel clause --------------------------------------------- Operation | Name --------------------------------------------- INSERT STATEMENT | PX COORDINATOR | PX SEND QC (RANDOM) | :TQ1000 LOAD AS SELECT (HYBRID TSM/HWMB)| T1 OPTIMIZER STATISTICS GATHERING | PX BLOCK ITERATOR | TABLE ACCESS FULL | T2 --------------------------------------------- SELECT * FROM v$pq_sesstat WHERE statistic like 'DML%'; STATISTIC LAST_QUERY SESSION_TOTAL CON_ID ------------------------------ ---------- ------------- ---------- DML Parallelized 1 3 0
  7. 7. How to ensure that PDML is used • Statement level or object level PARALLEL hint in INSERT • Forcing PDML in a session • Auto DOP • Parallel clause object decoration : ALTER SESSION FORCE PARALLEL DML; CREATE TABLE t_copy (…) PARALLEL; ALTER TABLE t_copy PARALLEL; INSERT /*+ parallel */ INTO t_copy t SELECT * FROM t_src; INSERT /*+ parallel(t) */ INTO t_copy t SELECT * FROM t_src; ALTER SESSION SET parallel_degree_policy = AUTO;
  8. 8. How to ensure that PDML is used (2) • Refer to the Table „Parallelization Priority Order“ • But test your ETL scenario! • In case of doubt, statement level hints have the highest priority
  9. 9. Restrictions preventing PDML • No PDML on tables with triggers • No PDML with enabled foreign keys. Use Reliable FK-constraints: valuable for CBO, but not disruptive for ETL (RELY DISABLE NOVALIDATE). Exception: reference partitioning! • Not enough parallel server • Parallel DML is not supported on a table with bitmap indexes if the table is not partitioned. IMPORTANT: For Partition Exchange Loading (PEL) don’t create any indexes on temporary table before loading it!
  10. 10. Restrictions preventing PDML (2) • Distributed transactions, DML on remote DB. • Documentation 12.2 states: • Indeed, this seems to work but doesn’t really make sense because DB link is always serial SQL> insert /*+ enable_parallel_dml parallel */ into t_sdoc select v.* from V_SDOC@remote_db V 2929218 rows created. SQL> select * from v$pq_sesstat where statistic like 'DML%' STATISTIC LAST_QUERY SESSION_TOTAL CON_ID ------------------- ---------- ------------- ---------- DML Parallelized 1 5 0 1 row selected. ------------------------------------------------------- | Id | Operation | Name | ------------------------------------------------------- | 0 | INSERT STATEMENT | | | 1 | PX COORDINATOR | | | 2 | PX SEND QC (RANDOM) | :TQ10001 | | 3 | LOAD AS SELECT (HYBRID TSM/HWMB)| | | 4 | OPTIMIZER STATISTICS GATHERING | | | 5 | PX RECEIVE | | | 6 | PX SEND ROUND-ROBIN | :TQ10000 | | 7 | REMOTE | V_SDOC | -------------------------------------------------------
  11. 11. Implications of PDML • PX-coordinator and each PX-Server are working in their own transactions • The coordinator uses a two-phase commit then • Hence, the user transaction is in a special mode • The results of parallel modifications cannot be seen in the same transaction • Complex ETL processes relying on transaction integrity could be a problem: no PDML can be used for intermediate steps. • The same error for serial direct path INSERT though, so you cannot use it as a reliable check of PDML being used SQL> select count(*) from t_sdoc Error at line 0 ORA-12838: cannot read/modify an object after modifying it in parallel
  12. 12. Space Management with PDML
  13. 13. Space Management with PDML • Multiple concurrent transactions are modifying the same object • What to consider doing Parallel Direct Path Insert? • Can this lead to excessive extent allocation or tablespace fragmentation? • It is helpful to have an idea of what happens behind the scenes. • Fortunately, Oracle 12c makes more information visible -------------------------------------------------------------- | Id | Operation | Name | -------------------------------------------------------------- | 0 | INSERT STATEMENT | | | 1 | PX COORDINATOR | | | 2 | PX SEND QC (RANDOM) | :TQ10000 | | 3 | LOAD AS SELECT (HYBRID TSM/HWMB)| T_COPY_PARALLEL | | 4 | OPTIMIZER STATISTICS GATHERING | | | 5 | PX BLOCK ITERATOR | | | 6 | TABLE ACCESS FULL | T_SRC | --------------------------------------------------------------
  14. 14. Uniform_TBS Table1 • Tablespace with uniform extent size • The unused space is inside the extent • Internal fragmentation • Full Table Scans will scan this free space too • This free space can be used by conventional inserts • But doing PDML-Insert (direct path) starts to fill a new extent every time Uniform vs. System-Allocated Extents All extents are equally sized Unused space is „inside“
  15. 15. Autoallocate_TBS Table1 Uniform vs. System-Allocated Extents • Autoallocate • 64K, 1M, 8M, 64M (8k block size) • If free space is left after loading (> min extent), extent trimming happens and this free space is returned back to the tablespace • External fragmentation: free space is not continuous and can potentially be reused if smaller extents are requested 8M 64M 8M 8M 8M 7M Different extent sizes Extents can be trimmed 1M
  16. 16. TBS Table1 High Water Mark Loading (HWM) • The server process has exclusive access to the segment (table or partition) and can insert into extents above the HWM • After commit the HWM is moved and new data becomes visible • Serial or parallel load with PKEY distribution Server Process
  17. 17. TBS Table1 Temp Segment Merge (TSM) Loading • Each PX Server is assigned and populating its own temporary segment • Last extents can be trimmed • Temp segments reside in the same tablespace and are merged into the target table by manipulating the extent map on commit • Very scalable but at least one extent per PX-server • Fragmentation possible because of trimming • In 12c rarely used when creating partitioned tables PX Slave PX Slave Temp Segment Temp Segment
  18. 18. TBS Table1 Temp Segment Merge (TSM) Loading • Each PX Server is assigned and populating its own temporary segment • Last extents can be trimmed • Temp segments reside in the same tablespace and are merged into the target table by manipulating the extent map on commit • Very scalable but at least one extent per PX-server • Fragmentation possible because of trimming • In 12c rarely used when creating partitioned tables PX Slave PX Slave
  19. 19. TBS Table1 High Water Mark Brokering (HWMB) • Multiple PX servers may insert into the same extent above the HWM, which should then be “brokered” • The brokering is implemented via HV enqueue • Results in fewer extents • But less scalable • Good for loading non-partitioned tables or single partitions PX Slave PX Slave HV Enqueue
  20. 20. RAC Instance 2RAC Instance 1 TBS Table1 High Water Mark Brokering (HWMB) • Scalability can become an issue with high DOP, especially in a RAC environment PX Slave PX Slave HV Enqueue PX Slave PX Slave
  21. 21. RAC Instance 2RAC Instance 1 Hybrid TSM/HWMB • New in 12.1 • Each temporary segment has its own HV enqueue which is only used by local PX servers in case of RAC • Fewer extents • Improved scalability PX Slave PX SlavePX Slave PX Slave HV Enqueue HV Enqueue TBS Table1 Temp Segment Temp Segment
  22. 22. Data Loading Distribution
  23. 23. Data Loading Distribution • Example: • Join two equipartitioned tables T_SRC2 and T_SRC3 • Hash-Partitioned, 64 partitions • 32 millions rows INSERT /*+ append parallel */ INTO t_tgt_join t0 (OWNER, OBJECT_TYPE, OBJECT_NAME, LVL, FILLER) SELECT t1.OWNER, t2.OBJECT_TYPE, t2.OBJECT_NAME, t1.LVL, t1.filler FROM t_src3 t1 JOIN t_src2 t2 ON ( t1.OWNER = t2.OWNER AND t1.OBJECT_NAME = t2.OBJECT_NAME AND t1.OBJECT_TYPE = t2.OBJECT_TYPE AND t1.lvl = t2.lvl);
  24. 24. Data Loading Distribution • An example of joining two tables in parallel • Which PX Servers are actually loading the result table? • The same ones that are doing the join? • Another PX set? Should the data then be redistributed again? • It is where data loading distribution matters T1 T2 P001 P002 P003 P004 PX set reading T1,T2 and redistributing PX set joining T1,T2 ?
  25. 25. Data Loading Distribution • Since 11.2 the hint PQ_DISTRIBUTE can be used to control load distribution • NONE – no distribution, load is performed by the same PX-Servers • PARTITION – distribution based on partitioning of target table • RANDOM – round-robin distribution, useful for highly skewed data • RANDOM_LOCAL – round-robin for PX servers on the same RAC instance
  26. 26. Data Loading Distribution - PARTITION INSERT /*+ append parallel pq_distribute (t0 partition) */ INTO t_tgt_join t0 SELECT /*+ pq_distribute (t2 none none) */ t1…, t2… FROM t_src3 t1 JOIN t_src2 t2 ON ( ...); --------------------------------------------------------------- | Id | Operation | Name | TQ | --------------------------------------------------------------- | 0 | INSERT STATEMENT | | | | 1 | PX COORDINATOR | | | | 2 | PX SEND QC (RANDOM) | :TQ10001 | Q1,01 | | 3 | LOAD AS SELECT (HIGH WATER MARK)| | Q1,01 | | 4 | OPTIMIZER STATISTICS GATHERING | | Q1,01 | | 5 | PX RECEIVE | | Q1,01 | | 6 | PX SEND PARTITION (KEY) | :TQ10000 | Q1,00 | | 7 | PX PARTITION HASH ALL | | Q1,00 | |* 8 | HASH JOIN | | Q1,00 | | 9 | TABLE ACCESS FULL | T_SRC2 | Q1,00 | | 10 | TABLE ACCESS FULL | T_SRC3 | Q1,00 | ---------------------------------------------------------------
  27. 27. Data Loading Distribution - NONE INSERT /*+ append parallel pq_distribute (t0 none) */ INTO t_tgt_join t0 SELECT /*+ pq_distribute (t2 none none) */ t1…, t2… FROM t_src3 t1 JOIN t_src2 t2 ON ( ...); -------------------------------------------------------------------- | Id | Operation | Name | TQ | -------------------------------------------------------------------- | 0 | INSERT STATEMENT | | | 1 | PX COORDINATOR | | | 2 | PX SEND QC (RANDOM) | :TQ10000| Q1,00 | 3 | LOAD AS SELECT (HIGH WATER MARK BROKERED)| | Q1,00 | 4 | OPTIMIZER STATISTICS GATHERING | | Q1,00 | 5 | PX PARTITION HASH ALL | | Q1,00 |* 6 | HASH JOIN | | Q1,00 | 7 | TABLE ACCESS FULL | T_SRC2 | Q1,00 | 8 | TABLE ACCESS FULL | T_SRC3 | Q1,00 --------------------------------------------------------------------
  28. 28. Data Loading Distribution - RANDOM INSERT /*+ append parallel pq_distribute (t0 random) */ INTO t_tgt_join t0 SELECT /*+ pq_distribute (t2 none none) */ t1…, t2… FROM t_src3 t1 JOIN t_src2 t2 ON ( ...); --------------------------------------------------------------------- | Id | Operation | Name | TQ --------------------------------------------------------------------- | 0 | INSERT STATEMENT | | | 1 | PX COORDINATOR | | | 2 | PX SEND QC (RANDOM) | :TQ10001 | Q1,01 | 3 | LOAD AS SELECT (HIGH WATER MARK BROKERED)| | Q1,01 | 4 | OPTIMIZER STATISTICS GATHERING | | Q1,01 | 5 | PX RECEIVE | | Q1,01 | 6 | PX SEND ROUND-ROBIN | :TQ10000 | Q1,00 | 7 | PX PARTITION HASH ALL | | Q1,00 |* 8 | HASH JOIN | | Q1,00 | 9 | TABLE ACCESS FULL | T_SRC2 | Q1,00 | 10 | TABLE ACCESS FULL | T_SRC3 | Q1,00 ---------------------------------------------------------------------
  29. 29. Data Loading Distribution - RANDOM INSERT /*+ append parallel pq_distribute (t0 random) */ INTO t_tgt_join t0 SELECT t1…, t2… FROM t_src3 t1 JOIN t_src2 t2 ON ( ...); ---------------------------------------------------------------------- | Id | Operation | Name | TQ ---------------------------------------------------------------------- | 0 | INSERT STATEMENT | | | 1 | PX COORDINATOR | | | 2 | PX SEND QC (RANDOM) | :TQ10003 | Q1,03 | 3 | LOAD AS SELECT (HIGH WATER MARK BROKERED)| | Q1,03 | 4 | OPTIMIZER STATISTICS GATHERING | | Q1,03 | 5 | PX RECEIVE | | Q1,03 | 6 | PX SEND ROUND-ROBIN | :TQ10002 | Q1,02 |* 7 | HASH JOIN BUFFERED | | Q1,02 | 8 | PART JOIN FILTER CREATE | :BF0000 | Q1,02 | 9 | PX RECEIVE | | Q1,02 | 10 | PX SEND HYBRID HASH | :TQ10000 | Q1,00 | 11 | STATISTICS COLLECTOR | | Q1,00 | 12 | PX BLOCK ITERATOR | | Q1,00 |*13 | TABLE ACCESS FULL | T_SRC2 | Q1,00 | 14 | PX RECEIVE | | Q1,02 | 15 | PX SEND HYBRID HASH | :TQ10001 | Q1,01 | 16 | PX BLOCK ITERATOR | | Q1,01 |*17 | TABLE ACCESS FULL | T_SRC3 | Q1,01 ----------------------------------------------------------------------
  30. 30. Data Loading Distribution – No PWJ, No Redistribution INSERT /*+ append parallel pq_distribute (t0 none) */ INTO t_tgt_join t0 SELECT t1…, t2… FROM t_src3 t1 JOIN t_src2 t2 ON ( ...); --------------------------------------------------------------------- | Id | Operation | Name | TQ --------------------------------------------------------------------- | 0 | INSERT STATEMENT | | | 1 | PX COORDINATOR | | | 2 | PX SEND QC (RANDOM) | :TQ10002 |Q1,02 | 3 | LOAD AS SELECT (HIGH WATER MARK BROKERED)| |Q1,02 | 4 | OPTIMIZER STATISTICS GATHERING | |Q1,02 |* 5 | HASH JOIN | |Q1,02 | 6 | PART JOIN FILTER CREATE | :BF0000 |Q1,02 | 7 | PX RECEIVE | |Q1,02 | 8 | PX SEND HYBRID HASH | :TQ10000 |Q1,00 | 9 | STATISTICS COLLECTOR | |Q1,00 | 10 | PX BLOCK ITERATOR | |Q1,00 |*11 | TABLE ACCESS FULL | T_SRC2 |Q1,00 | 12 | PX RECEIVE | |Q1,02 | 13 | PX SEND HYBRID HASH | :TQ10001 |Q1,01 | 14 | PX BLOCK ITERATOR | |Q1,01 |*15 | TABLE ACCESS FULL | T_SRC3 |Q1,01 ---------------------------------------------------------------------
  31. 31. Data Loading Distribution • But in the presence of an index the hint is ignored! • Even if the index is unusable • The distribution is needed again and is causing a buffered hash join • High Water Mark (HWM) because of the exclusive access to the segment CREATE BITMAP INDEX t_idx_tgt on t_tgt_join (OWNER) LOCAL PARALLEL; INSERT /*+ append parallel pq_distribute (t0 none) */ ... | 0 | INSERT STATEMENT | | | 1 | PX COORDINATOR | | | 2 | PX SEND QC (RANDOM) | :TQ10004 | Q1,04 | 3 | INDEX MAINTENANCE | T_TGT_JOIN | Q1,04 | 4 | PX RECEIVE | | Q1,04 | 5 | PX SEND RANGE | :TQ10003 | Q1,03 | 6 | LOAD AS SELECT (HIGH WATER MARK)| | Q1,03 | 7 | OPTIMIZER STATISTICS GATHERING | | Q1,03 | 8 | PX RECEIVE | | Q1,03 | 9 | PX SEND PARTITION (KEY) | :TQ10002 | Q1,02 |*10 | HASH JOIN BUFFERED | | Q1,02 | 11 | PART JOIN FILTER CREATE | :BF0000 | Q1,02 | 12 | PX RECEIVE | | Q1,02 | 13 | PX SEND HYBRID HASH | :TQ10000 | Q1,00 | 14 | STATISTICS COLLECTOR | | Q1,00 | 15 | PX BLOCK ITERATOR | | Q1,00 |*16 | TABLE ACCESS FULL | T_SRC2 | Q1,00 | 17 | PX RECEIVE | | Q1,02 | 18 | PX SEND HYBRID HASH | :TQ10001 | Q1,01 | 19 | PX BLOCK ITERATOR | | Q1,01 |*20 | TABLE ACCESS FULL | T_SRC3 | Q1,01 -----------------------------------------------------------------
  32. 32. Differences with MERGE
  33. 33. Space Management with PDML and MERGE? • Extents after first delta loading (~ 3%) with MERGE and INSERT SQL> MERGE /*+ append parallel*/ 2 INTO t_tgt_join t0 3 USING ( SELECT ... ---------------------------------------- | Id | Operation | ---------------------------------------- | 0 | MERGE STATEMENT | | 1 | PX COORDINATOR | | 2 | PX SEND QC (RANDOM) | | 3 | MERGE | | 4 | PX RECEIVE | SEGMENT_NAME BLOCKS CNT ------------ ------ ------- T_TGT_JOIN 8 2113 ... 13 rows ... T_TGT_JOIN 128 4713 ... 20 rows ... T_TGT_JOIN 1024 34 36 rows selected. SQL> INSERT /*+ append parallel */ 2 INTO t_tgt_join t0 3 SELECT ... -------------------------------------------------- |Id | Operation -------------------------------------------------- | 0 | INSERT STATEMENT | 1 | PX COORDINATOR | 2 | PX SEND QC (RANDOM) | 3 | LOAD AS SELECT (HIGH WATER MARK BROKERED) | 4 | OPTIMIZER STATISTICS GATHERING SEGMENT_NAME BLOCKS CNT ------------ ---------- --------- T_TGT_JOIN 8 1024 T_TGT_JOIN 128 4248 ... 6 rows ... T_TGT_JOIN 1024 139 9 rows selected.1154 new extents! 60 new extents!
  34. 34. MERGE • Basically, if PDML is turned on in a session and for particular statement, MERGE will parallelize both the INSERT and UPDATE operations • But there are some differences: • No space management decoration is reported in the execution plan • Even worse, it always seems to run as Temp Segment Merge. • Significantly more extents are created • Many of them are trimmed • Every load operation starts again with many 64K extents • Maybe it’s worth thinking about providing INITIAL and NEXT even for Autoallocate tablespace • Avoid MERGE if you don’t really need it (for example you materialize temporary results anyway like ODI SCD Type 2 Knowledge Module does and could then update and insert in two parallel operations).
  35. 35. Summary • Don’t overuse PDML. Turn it on only selectively where it makes sense • Be careful and double check that your statements are doing PDML • Oracle reports the space management strategy for LOAD AS SELECT operations in execution plans from 12.1.0.2, but not for MERGE operations • Bloating extent map will have a negative effect on the parallel queries • From 12c Oracle has introduced Hybrid TSM/HWMB which increases scalability but keeps extent number small • Don’t create indexes on tables for partition exchange, they can significantly influence the execution plan. Bitmap indexes will even disable PDML! • For the most critical loading processes check data distribution which you can influence with PQ_DISTRIBUTE hint • If using MERGE for critical ETL, check the space management behavior
  36. 36. Links • Oracle Documentation, VLDB Guide, About Parallel DML Operations • Nigel Bayliss, Space Management with PDML • Randolf Geist, Understanding Parallel Execution - Part 1 and Part 2 • Randolf Geist, Hash Join Buffered • Timur Akhmadeev, PQ_DISTRIBUTE Enhancement • Jonathan Lewis, Autoallocate and PX

×