Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Online Statistics Gathering for ETL

Online Statistics Gathering for Bulk Loads - the official name of the feature - was introduced in Oracle 12.1. The idea is to gather optimizer statistics "on the fly" for direct path loads. Sounds good for ETL? In certain scenarios it makes sense but even then there are many points to consider so that it becomes a reliable part of your ETL processes. When exactly will it be working and when not? Do you prevent it yourself? Documented, undocumented cases, known bugs. Which statistics are gathered and which are not? What has to be considered with partitioned tables? Interval partitioning - special case?

  • Be the first to comment

  • Be the first to like this

Online Statistics Gathering for ETL

  1. 1. Online Statistics Gathering for ETL Andrej Pashchenko @Andrej_SQL doag2018
  2. 2. We help to generate added value from data Online Statistics Gathering for ETL2 20.11.2018
  3. 3. With over 650 specialists and IT experts in your region. Online Statistics Gathering for ETL3 20.11.2018 16 Trivadis branches and more than 650 employees Experience from more than 1,900 projects per year at over 800 customers 250 Service Level Agreements Over 4,000 training participants Research and development budget: CHF 5.0 million Financially self-supporting and sustainably profitable
  4. 4. About Me Online Statistics Gathering for ETL4 20.11.2018 Working at Trivadis GmbH, Düsseldorf Focus on Oracle: – Data Warehousing – Application Development – Application Performance Course instructor „Oracle 12c New Features for Developers“ und „Beyond SQL and PL/SQL“ Blog: http://blog.sqlora.com
  5. 5. Online Statistics Gathering for Bulk Loads Online Statistics Gathering for ETL6 20.11.2018 new feature of Oracle 12c for direct path writes – Create Table As Select (CTAS) – Bulk Inserts into an empty segment table and basic columns statistics are collected „on the fly“ (piggyback) no additional table scan is required
  6. 6. What ETL scenarios can benefit? Online Statistics Gathering for ETL7 20.11.2018 Data Warehouse MartsCore Metadata Cleansing AreaStaging Area ? ? ??
  7. 7. Online Statistics Gathering for ETL8 20.11.2018 Control, Preconditions, Scope
  8. 8. Switch ON/OFF, selective enable/disable Online Statistics Gathering for ETL9 20.11.2018 The behaviour is controlled by the parameter _optimizer_gather_stats_on_load The default is TRUE Hints: (NO_)GATHER_OPTIMIZER_STATISTICS These hints are only meaningful to override the parameter setting for particular SQL Do not enforce gathering without precondictions (empty table, direct path) ALTER SYSTEM(SESSION) SET "_optimizer_gather_stats_on_load" = TRUE (FALSE); INSERT /*+ APPEND GATHER_OPTIMIZER_STATISTICS */ INTO ...
  9. 9. How do I know it works? Online Statistics Gathering for ETL10 20.11.2018 New step in explain plan, but not sufficient to judge It only means: the feature is turned on and will be considered Also check LAST_ANALYZED for table and NOTES for column statistics: Also reflected in (ALL|DBA)USER_TAB_STATS_HISTORY SQL> SELECT table_name, num_rows, 2 last_analyzed 3 FROM user_tab_statistics 4 WHERE table_name = 'T2'; TABLE NUM_ROWS LAST_ANALY ----- ---------- ---------- T2 10000 09-OCT-18 SQL> SELECT column_name, num_distinct, 2 num_buckets, sample_size, notes 3 FROM user_tab_col_statistics 4 WHERE table_name = 'T2'; COLUM NUM_DIST NUM_B SAMPLE_S NOTES ----- -------- ----- -------- ------------- N 10000 1 10000 STATS_ON_LOAD
  10. 10. How do I know it works? Online Statistics Gathering for ETL11 20.11.2018 Oracle Documentation suggests to look at USER_TAB_MODIFICATIONS.INSERTS column. If the query returns a row indicating the number of rows loaded, then the most recent bulk load did not gather statistics automatically. INSERT /*+ APPEND */ ... SQL> begin 2 dbms_stats.FLUSH_DATABASE_MONITORING_INFO; 3 end; 4 / PL/SQL procedure successfully completed. SQL> SELECT inserts 2 FROM user_tab_modifications 3 WHERE table_name = 'T'; INSERTS ---------- 10000
  11. 11. Direct Path Insert (I) Online Statistics Gathering for ETL12 20.11.2018 DB-blocks are written to disk by user server process (not DBWR), bypassing the buffer cache CTAS is always perfoming direct path Use APPEND hint for INSERT AS SELECT (IAS) Also INSERT mit APPEND_VALUES from FORALL statement in PL/SQL INSERT /*+ append */ INTO ... SELECT Hints are evil ?! Not this one! It is a non-optimizer hint and the official and only possible way to tell the DB to do a direct path insert
  12. 12. Direct Path Insert (II) – Parallel DML Online Statistics Gathering for ETL13 20.11.2018 If INSERT is done in parallel (DML) the default mode ist direct path Enable parallel DML for your session or with the hint just for your SQL Specify PARALLEL clause for the table or use PARALLEL hint in your INSERT: Parallel DML would not be possible in some cases Do not rely on that and always specify APPEND hint altogether ALTER SESSION ENABLE PARALLEL DML; or INSERT /*+ enable_parallel_dml */ INTO ... ALTER TABLE tx2 PARALLEL 4; or INSERT /*+ parallel */ INTO tx2 ... INSERT /*+ append enable_parallel_dml parallel */ INTO ...
  13. 13. Direct Path Insert (III) Online Statistics Gathering for ETL14 20.11.2018 There are some cases where APPEND hint can be ignored (e.g. triggers, enabled foreign keys). Reliable FK-constraints: valuable for the CBO, but not disruptive for ETL (RELY DISABLE NOVALIDATE) Trigger: redesign, drop How to proof? – LOAD AS SELECT in explain plan – You cannot read the table in the same transaction after direct path insert ORA-12838: cannot read/modify an object after modifying it in parallel ------------------------------------------ | 0 | INSERT STATEMENT | | 1 | LOAD AS SELECT | | 2 | OPTIMIZER STATISTICS GATHERING |
  14. 14. Direct Path and MERGE? Online Statistics Gathering for ETL15 20.11.2018 SQL> MERGE /*+ append */ 2 INTO tx2 USING tx ON (tx.n = tx2.n) 3 WHEN NOT MATCHED THEN INSERT (N) VALUES (tx.n) ; 10,000 rows merged. ---------------------------------------- | Id | Operation | Name | ---------------------------------------- | 0 | MERGE STATEMENT | | | 1 | MERGE | TX2 | | 2 | VIEW | | |* 3 | HASH JOIN RIGHT OUTER| | | 4 | TABLE ACCESS FULL | TX2 | | 5 | TABLE ACCESS FULL | TX | ---------------------------------------- SQL> SELECT count(*) FROM tx2; ORA-12838: cannot read/modify an object after modifying it in parallel SQL> INSERT /*+ append*/ INTO tx2 2 SELECT n FROM tx 3 WHERE n NOT IN (SELECT n FROM tx2); 10,000 rows inserted. ----------------------------------------------- | Id | Operation | Name ----------------------------------------------- | 0 | INSERT STATEMENT | | 1 | LOAD AS SELECT | TX2 | 2 | OPTIMIZER STATISTICS GATHERING | |* 3 | HASH JOIN RIGHT ANTI NA | | 4 | TABLE ACCESS FULL | TX2 | 5 | TABLE ACCESS FULL | TX ----------------------------------------------- direct path write has happend
  15. 15. Empty Segment Online Statistics Gathering for ETL16 20.11.2018 What does „empty“ exactly mean? The rules apply to partitions as well Case Online Statistics? New Table YES DELETE all rows NO INSERT and ROLLBACK NO TRUNCATE (DROP STORAGE) YES TRUNCATE REUSE STORAGE NO
  16. 16. What kind of statistics are gathered? Online Statistics Gathering for ETL17 20.11.2018 Table Statistics Base column statistics, also virtual columns and column groups (12.2) No histograms – not often used in Stage and Cleansing but can be important in Core and Mart areas No Index Statistics – must be gathered separately. Interactions with the default statistics maintainence job can be dangerous: Data TRUNCATE DBMS_STATS Empty BULK LOAD Data Wrong index stats may confuse CBO! Empty
  17. 17. Online Statistics Gathering for ETL18 20.11.2018 Restrictions and Pitfalls
  18. 18. Restrictions and Pitfalls (I) Online Statistics Gathering for ETL19 20.11.2018 Restrictions are documented Check your version´s documentation. Even if you are on 12.1 also check the 12.2 documentation. It has better explanations, some restrictions were lifted in 12.2 and backported to 12.1, so check MOS and test it! Some examples: – IOT, nested tables – Statistics are locked or PUBLISH preference is set to FALSE – Multitable Insert (INSERT ALL/FIRST)
  19. 19. Restrictions and Pitfalls (II) Online Statistics Gathering for ETL20 20.11.2018 Statistics are gathered only if all skipped columns have a default value (e.g. extending the tables and not changing the ETL process at the same time) Virtual columns are claimed to be a restriction in 12.1 documentation, but not in 12.2 In fact it also works in 12.1.0.2 (backport?) The presense of extended statistics prevents online statistics gathering (bug 18425876), fixed in 12.2, but also works in 12.1.0.2 (backport?)
  20. 20. Online Statistics Gathering for ETL21 20.11.2018 Partitioned Tables
  21. 21. Partitioned Tables (simple case) Online Statistics Gathering for ETL22 20.11.2018 Insert into an empty partitioned table: Only global table statistics are gathered No online statistics gathering for subsequent inserts (the table is not empty anymore) Even if inserting into an empty partition INSERT /*+ append*/ INTO t_part SELECT ... Table T_PART P1 P2
  22. 22. Partitioned Tables (using extended syntax) Online Statistics Gathering for ETL23 20.11.2018 Insert into an empty partition, explicitly specifying partition name: Only partition level statistics are gathered Online statistics gathering is also active for subsequent inserts into yet empty partitions INSERT /*+ append*/ INTO t_part PARTITION (P1) SELECT ... Table T_PART P1 P2 INSERT /*+ append*/ INTO t_part PARTITION (P2) SELECT ...
  23. 23. Partitioned Tables (using extended syntax+incremental) Online Statistics Gathering for ETL24 20.11.2018 Insert into an empty partition, explicitly specifying partition name, INCREMENTAL- preference is TRUE: Partition level statistics and synopsis for global stats are gathered Online statistics gathering is also active for subsequent inserts into yet empty partitions INSERT /*+ append*/ INTO t_part PARTITION (P1) Table T_PART P1 P2 INSERT /*+ append*/ INTO t_part PARTITION (P2) exec dbms_stats.set_table_prefs(null,'T_PART','INCREMENTAL','TRUE')
  24. 24. Partitioned Tables (using extended syntax+incremental) Online Statistics Gathering for ETL25 20.11.2018 Insert into an empty partitioned table, no partition-extended syntaxl (partition name), INCREMENTAL-preference is TRUE: NO Online statistics gathering at all! Documented restriction INSERT /*+ append*/ INTO t_part Table T_PART P1 P2 INSERT /*+ append*/ INTO t_part exec dbms_stats.set_table_prefs(null,'T_PART','INCREMENTAL','TRUE')
  25. 25. Partitioned Tables Online Statistics Gathering for ETL26 20.11.2018 Loading into an empty partition INCREMENTAL=TRUE INCREMENTAL=FALSE Partition name specified partition-level stats + synopsis partition-level stats No partition name specified No stats at all! global stats
  26. 26. Partitioned Tables – Interval Partitioning Online Statistics Gathering for ETL27 20.11.2018 Partitions are automatically created as data arrives No online statistics in 12.1, fixed in 12.2 Partition names are system generated and are not known in advance No way for online gathering of partition level stats? Actually there is one – using extended syntax with values: But the value is hard-coded (no bind variables), so that dynamic SQL is needed INSERT /*+ append*/ INTO t_part PARTITION FOR (DATE '2018-11-20')
  27. 27. Table T_PART Partitioned Tables – Partition Exchange Online Statistics Gathering for ETL28 20.11.2018 P1 INSERT /*+ append*/ INTO t_exchange P2 P3 T_EXCHANGE Table T_PART P1 P2 P3 T_EXCHANGE
  28. 28. Partitioned Tables – Partition Exchange Online Statistics Gathering for ETL29 20.11.2018 Since 12.1 synopses can be gathered on non-partitioned tables This happens also with online statistcs gathering CREATE TABLE t_exchange FOR EXCHANGE WITH TABLE t_part; BEGIN dbms_stats.set_table_prefs(null,'t_exchange','INCREMENTAL','TRUE'); dbms_stats.set_table_prefs(null,'t_exchange','INCREMENTAL_LEVEL','TABLE'); END; INSERT /*+ append */ INTO t_exchange ... -- Online statistics including synopsis are gathered ALTER TABLE T_PART EXCHANGE PARTITION P3 WITH TABLE t_exchange; -- Partition statistics and synopsis are exchanged
  29. 29. Online Statistics Gathering for ETL30 20.11.2018 Autonomous Data Warehouse Cloud Service (ADWC) What‘s new?
  30. 30. Autonomous Data Warehouse Cloud – What‘s new? Online Statistics Gathering for ETL31 20.11.2018 Statistics are gathered automatically Unlike 12c, this works also for non-empty tables for histograms Two new undocumented parameters _optimizer_gather_stats_on_load_all (default: TRUE) _optimizer_gather_stats_on_load_hist (default: TRUE) ------------------------------------------------------ | Id | Operation | Name | ------------------------------------------------------ | 0 | INSERT STATEMENT | | | 1 | LOAD AS SELECT | TARGET | | 2 | PX COORDINATOR | | | 3 | PX SEND QC (RANDOM) | :TQ10000 | | 4 | OPTIMIZER STATISTICS GATHERING | | | 5 | PX BLOCK ITERATOR | | | 6 | TABLE ACCESS STORAGE FULL | SOURCE | ------------------------------------------------------
  31. 31. Summary Online Statistics Gathering for ETL32 20.11.2018 very useful feature if loading in combination with truncate (Stage, Cleansing) becomes better with every release since 12.1 can save time and resources during ETL runs becomes more complicated in case of partitioned tables requires dynamic SQL for efficient loading in partitioned tables No support for MERGE /*+ append */ subject to check and test with your load scenario and your ETL-tool Test it!
  32. 32. Online Statistics Gathering for ETL33 20.11.2018 http://blog.sqlora.com/en/tag/osg/ https://blogs.oracle.com/optimizer/efficient-statistics-maintenance-for-partitioned-tables- using-incremental-statistics-part-2 https://danischnider.wordpress.com/2018/07/11/gathering-statistics-in-the-autonomous-data- warehouse-cloud/
  33. 33. Trivadis @ DOAG 2018 #opencompany Booth: 3rd Floor – next to the escalator We share our Know how! Just come across, Live-Presentations and documents archive T-Shirts, Contest and much more We look forward to your visit 20.11.2018 Online Statistics Gathering for ETL34

×