- Properly using parallel DML (PDML) for ETL can improve performance by leveraging multiple CPUs/cores.
- To enable PDML, it must be enabled at the system, session, or statement level. Additional steps may be needed to ensure the optimizer chooses a parallel plan.
- Considerations for using PDML include available parallel servers, restrictions like triggers or foreign keys, and implications on transactions.
- Oracle has different methods for data loading in PDML like HWM, TSM, and HWMB that impact extent allocation and fragmentation.
- The PQ_DISTRIBUTE hint controls how rows are distributed among parallel servers during the load to optimize performance and scalability.
2. About me
• Working at Trivadis, Düsseldorf
• Focusing on Oracle:
• Data Warehousing
• Application Development
• Application Performance
• Course instructor „Oracle New Features for
Developers“
@Andrej_SQL blog.sqlora.com
3.
4. Parallel Processing in Oracle DB
Parallel
Processing
Parallel Query Parallel DDL Parallel DML
SELECT
• CTAS
• CREATE INDEX
• ALTER TABLE MOVE
• …
• Parallel IAS
• Parallel MERGE
• Parallel UPDATE
• Parallel DELETE
6. How to enable PDML
• Parallel Query and Parallel DDL are enabled by default
• Parallel DML has to be enabled first at system or session level:
• In 12c it is also possible with a hint at statement level :
• Issue with the hint: hard parse on every execution, caution with plan stability
• But enabling PDML doesn’t yet mean a parallel execution plan will be used
ALTER SESSION ENABLE PARALLEL DML;
INSERT /*+ enable_parallel_dml parallel append */
INTO sales
SELECT /*+ parallel */ * FROM sales_v;
7. How do I know PDML was used?
• Check the position of DML, e.g. LOAD AS SELECT, with respect to query coordinator
• Check the note
• Check v$pq_sesstat
---------------------------------------------
Operation | Name
---------------------------------------------
INSERT STATEMENT |
LOAD AS SELECT | T1
PX COORDINATOR |
PX SEND QC (RANDOM) | :TQ1000
OPTIMIZER STATISTICS GATHERING |
PX BLOCK ITERATOR |
TABLE ACCESS FULL | T2
---------------------------------------------
Note
- PDML disabled because object is not decorated with
parallel clause
---------------------------------------------
Operation | Name
---------------------------------------------
INSERT STATEMENT |
PX COORDINATOR |
PX SEND QC (RANDOM) | :TQ1000
LOAD AS SELECT (HYBRID TSM/HWMB)| T1
OPTIMIZER STATISTICS GATHERING |
PX BLOCK ITERATOR |
TABLE ACCESS FULL | T2
---------------------------------------------
SELECT * FROM v$pq_sesstat WHERE statistic like 'DML%';
STATISTIC LAST_QUERY SESSION_TOTAL CON_ID
------------------------------ ---------- ------------- ----------
DML Parallelized 1 3 0
8. How to ensure that PDML is used
• Statement level or object level PARALLEL hint in INSERT
• Forcing PDML in a session
• Auto DOP
• Parallel clause object decoration :
ALTER SESSION FORCE PARALLEL DML;
CREATE TABLE t_copy (…) PARALLEL;
ALTER TABLE t_copy PARALLEL;
INSERT /*+ parallel */ INTO t_copy t SELECT * FROM t_src;
INSERT /*+ parallel(t) */ INTO t_copy t SELECT * FROM t_src;
ALTER SESSION SET parallel_degree_policy = AUTO;
9. How to ensure that PDML is used (2)
• Refer to the Table „Parallelization Priority Order“
• But test your ETL scenario!
• In case of doubt, statement level hints have the highest priority
10. Restrictions preventing PDML
• No PDML on tables with triggers
• No PDML with enabled foreign keys. Use Reliable FK-constraints: valuable for CBO, but not
disruptive for ETL (RELY DISABLE NOVALIDATE). Exception: reference partitioning!
• Not enough parallel server
• Parallel DML is not supported on a table with bitmap indexes if the table is not partitioned.
IMPORTANT: For Partition Exchange Loading (PEL) don’t create any indexes on temporary table
before loading it!
11. Restrictions preventing PDML (2)
• Distributed transactions, DML on remote DB.
• Documentation 12.2 states:
• Indeed, this seems to work but doesn’t really make sense because DB link is always serial
SQL> insert /*+ enable_parallel_dml parallel */
into t_sdoc
select v.* from V_SDOC@remote_db V
2929218 rows created.
SQL> select * from v$pq_sesstat where statistic like 'DML%'
STATISTIC LAST_QUERY SESSION_TOTAL CON_ID
------------------- ---------- ------------- ----------
DML Parallelized 1 5 0
1 row selected.
-------------------------------------------------------
| Id | Operation | Name |
-------------------------------------------------------
| 0 | INSERT STATEMENT | |
| 1 | PX COORDINATOR | |
| 2 | PX SEND QC (RANDOM) | :TQ10001 |
| 3 | LOAD AS SELECT (HYBRID TSM/HWMB)| |
| 4 | OPTIMIZER STATISTICS GATHERING | |
| 5 | PX RECEIVE | |
| 6 | PX SEND ROUND-ROBIN | :TQ10000 |
| 7 | REMOTE | V_SDOC |
-------------------------------------------------------
12. Implications of PDML
• PX-coordinator and each PX-Server are working in their own transactions
• The coordinator uses a two-phase commit then
• Hence, the user transaction is in a special mode
• The results of parallel modifications cannot be seen in the same transaction
• Complex ETL processes relying on transaction integrity could be a problem: no PDML can be used for
intermediate steps.
• The same error for serial direct path INSERT though, so you cannot use it as a reliable check of PDML being
used
SQL> select count(*) from t_sdoc
Error at line 0
ORA-12838: cannot read/modify an object after modifying it in parallel
14. Space Management with PDML
• Multiple concurrent transactions are modifying the same object
• What to consider doing Parallel Direct Path Insert?
• Can this lead to excessive extent allocation or tablespace fragmentation?
• It is helpful to have an idea of what happens behind the scenes.
• Fortunately, Oracle 12c makes more information visible
--------------------------------------------------------------
| Id | Operation | Name |
--------------------------------------------------------------
| 0 | INSERT STATEMENT | |
| 1 | PX COORDINATOR | |
| 2 | PX SEND QC (RANDOM) | :TQ10000 |
| 3 | LOAD AS SELECT (HYBRID TSM/HWMB)| T_COPY_PARALLEL |
| 4 | OPTIMIZER STATISTICS GATHERING | |
| 5 | PX BLOCK ITERATOR | |
| 6 | TABLE ACCESS FULL | T_SRC |
--------------------------------------------------------------
15. Uniform_TBS
Table1
• Tablespace with uniform extent size
• The unused space is inside the
extent
• Internal fragmentation
• Full Table Scans will scan this free
space too
• This free space can be used by
conventional inserts
• But doing PDML-Insert (direct path)
starts to fill a new extent every time
Uniform vs. System-Allocated Extents
All extents are equally sized
Unused space is „inside“
16. Autoallocate_TBS
Table1
Uniform vs. System-Allocated Extents
• Autoallocate
• 64K, 1M, 8M, 64M (8k block size)
• If free space is left after loading
(> min extent), extent trimming
happens and this free space is
returned back to the tablespace
• External fragmentation: free space is
not continuous and can potentially
be reused if smaller extents are
requested
8M 64M
8M 8M 8M
7M
Different extent sizes
Extents can be trimmed
1M
17. TBS
Table1
High Water Mark Loading (HWM)
• The server process has exclusive
access to the segment (table or
partition) and can insert into extents
above the HWM
• After commit the HWM is moved
and new data becomes visible
• Serial or parallel load with PKEY
distribution
Server Process
18. TBS
Table1
Temp Segment Merge (TSM) Loading
• Each PX Server is assigned and
populating its own temporary
segment
• Last extents can be trimmed
• Temp segments reside in the same
tablespace and are merged into the
target table by manipulating the
extent map on commit
• Very scalable but at least one extent
per PX-server
• Fragmentation possible because of
trimming
• In 12c rarely used when creating
partitioned tables
PX Slave PX Slave
Temp Segment Temp Segment
19. TBS
Table1
Temp Segment Merge (TSM) Loading
• Each PX Server is assigned and
populating its own temporary
segment
• Last extents can be trimmed
• Temp segments reside in the same
tablespace and are merged into the
target table by manipulating the
extent map on commit
• Very scalable but at least one extent
per PX-server
• Fragmentation possible because of
trimming
• In 12c rarely used when creating
partitioned tables
PX Slave PX Slave
20. TBS
Table1
High Water Mark Brokering (HWMB)
• Multiple PX servers may insert into
the same extent above the HWM,
which should then be “brokered”
• The brokering is implemented via
HV enqueue
• Results in fewer extents
• But less scalable
• Good for loading non-partitioned
tables or single partitions
PX Slave PX Slave
HV
Enqueue
21. RAC Instance 2RAC Instance 1
TBS
Table1
High Water Mark Brokering (HWMB)
• Scalability can become an issue with
high DOP, especially in a RAC
environment
PX Slave PX Slave
HV
Enqueue
PX Slave PX Slave
22. RAC Instance 2RAC Instance 1
Hybrid TSM/HWMB
• New in 12.1
• Each temporary segment has its own
HV enqueue which is only used by
local PX servers in case of RAC
• Fewer extents
• Improved scalability
PX Slave PX SlavePX Slave PX Slave
HV Enqueue HV Enqueue
TBS
Table1
Temp Segment Temp Segment
24. Data Loading Distribution
• Example:
• Join two equipartitioned tables T_SRC2 and T_SRC3
• Hash-Partitioned, 64 partitions
• 32 millions rows
INSERT /*+ append parallel */
INTO t_tgt_join t0 (OWNER, OBJECT_TYPE, OBJECT_NAME, LVL, FILLER)
SELECT t1.OWNER, t2.OBJECT_TYPE, t2.OBJECT_NAME, t1.LVL, t1.filler
FROM t_src3 t1 JOIN t_src2 t2
ON ( t1.OWNER = t2.OWNER AND t1.OBJECT_NAME = t2.OBJECT_NAME
AND t1.OBJECT_TYPE = t2.OBJECT_TYPE AND t1.lvl = t2.lvl);
25. Data Loading Distribution
• An example of joining two tables in
parallel
• Which PX Servers are actually
loading the result table?
• The same ones that are doing the
join?
• Another PX set? Should the data
then be redistributed again?
• It is where data loading distribution
matters
T1 T2
P001 P002
P003 P004
PX set reading T1,T2
and redistributing
PX set joining T1,T2
?
26. Data Loading Distribution
• Since 11.2 the hint PQ_DISTRIBUTE can be used to control load distribution
• NONE – no distribution, load is performed by the same PX-Servers
• PARTITION – distribution based on partitioning of target table
• RANDOM – round-robin distribution, useful for highly skewed data
• RANDOM_LOCAL – round-robin for PX servers on the same RAC instance
35. MERGE
• Basically, if PDML is turned on in a session and for particular statement, MERGE will
parallelize both the INSERT and UPDATE operations
• But there are some differences:
• No space management decoration is reported in the execution plan
• Even worse, it always seems to run as Temp Segment Merge.
• Significantly more extents are created
• Many of them are trimmed
• Every load operation starts again with many 64K extents
• Maybe it’s worth thinking about providing INITIAL and NEXT even for Autoallocate
tablespace
• Avoid MERGE if you don’t really need it (for example you materialize temporary results
anyway like ODI SCD Type 2 Knowledge Module does and could then update and insert in
two parallel operations).
36. Summary
• Don’t overuse PDML. Turn it on only selectively where it makes sense
• Be careful and double check that your statements are doing PDML
• Oracle reports the space management strategy for LOAD AS SELECT operations in
execution plans from 12.1.0.2, but not for MERGE operations
• Bloating extent map will have a negative effect on the parallel queries
• From 12c Oracle has introduced Hybrid TSM/HWMB which increases scalability but keeps
extent number small
• Don’t create indexes on tables for partition exchange, they can significantly influence
the execution plan. Bitmap indexes will even disable PDML!
• For the most critical loading processes check data distribution which you can influence
with PQ_DISTRIBUTE hint
• If using MERGE for critical ETL, check the space management behavior
37. Links
• Oracle Documentation, VLDB Guide, About Parallel DML Operations
• Nigel Bayliss, Space Management with PDML
• Randolf Geist, Understanding Parallel Execution - Part 1 and Part 2
• Randolf Geist, Hash Join Buffered
• Timur Akhmadeev, PQ_DISTRIBUTE Enhancement
• Jonathan Lewis, Autoallocate and PX