Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

SQL Pattern Matching – should I start using it?

Introduced in Oracle Database 12c, the new MATCH_RECOGNIZE clause allows pattern matching across rows and is often associated with Big Data, complex event processing, etc. Should SQL developers who are not (yet) faced with such tasks ignore it? No way! The new feature is powerful enough to simplify a lot of day-to-day tasks and to solve them in a new, simple and efficient way. The insight into a new syntax is given based on common examples, as finding gaps, merging temporal intervals or grouping on fuzzy criteria. Providing more straightforward approach for solving known problems, the new functionality is worth to be a part of every developer’s toolbox.

  • Be the first to comment

  • Be the first to like this

SQL Pattern Matching – should I start using it?

  1. 1. BASEL BERN BRUGG DÜSSELDORF FRANKFURT A.M. FREIBURG I.BR. GENF HAMBURG KOPENHAGEN LAUSANNE MÜNCHEN STUTTGART WIEN ZÜRICH 12c SQL Pattern Matching – wann werde ich das benutzen? Andrej Pashchenko Senior Consultant Trivadis GmbH
  2. 2. Unser Unternehmen. 12c SQL Pattern Matching – wann werde ich das benutzen?2 19.11.2015 Trivadis ist führend bei der IT-Beratung, der Systemintegration, dem Solution Engineering und der Erbringung von IT-Services mit Fokussierung auf - und -Technologien in der Schweiz, Deutschland, Österreich und Dänemark. Trivadis erbringt ihre Leistungen aus den strategischen Geschäftsfeldern: Trivadis Services übernimmt den korrespondierenden Betrieb Ihrer IT Systeme. B E T R I E B
  3. 3. KOPENHAGEN MÜNCHEN LAUSANNE BERN ZÜRICH BRUGG GENF HAMBURG DÜSSELDORF FRANKFURT STUTTGART FREIBURG BASEL WIEN Mit über 600 IT- und Fachexperten bei Ihnen vor Ort. 12c SQL Pattern Matching – wann werde ich das benutzen?3 19.11.2015 14 Trivadis Niederlassungen mit über 600 Mitarbeitenden. Über 200 Service Level Agreements. Mehr als 4'000 Trainingsteilnehmer. Forschungs- und Entwicklungsbudget: CHF 5.0 Mio. Finanziell unabhängig und nachhaltig profitabel. Erfahrung aus mehr als 1'900 Projekten pro Jahr bei über 800 Kunden.
  4. 4. Über mich 12c SQL Pattern Matching – wann werde ich das benutzen?4 19.11.2015 Senior Consultant bei der Trivadis GmbH, Düsseldorf Schwerpunkt Oracle – Application Development – Application Performance – Data Warehousing 22 Jahre IT-Erfahrung, davon 16 Jahre mit Oracle DB Kurs-Referent „Oracle 12c New Features für Entwickler“ und „Beyond SQL and PL/SQL“ Blog: http://blog.sqlora.com
  5. 5. Agenda 12c SQL Pattern Matching – wann werde ich das benutzen?5 19.11.2015 1. Introduction 2. Find consecutive ranges and gaps 3. Trouble Ticket roundtrip 4. Grouping on fuzzy criteria 5. Merge temporal intervals
  6. 6. 12c SQL Pattern Matching – wann werde ich das benutzen?6 19.11.2015 Introduction
  7. 7. Introduction 12c SQL Pattern Matching – wann werde ich das benutzen?7 19.11.2015 Analytic functions Analytic functions enhancements SQL Model Clause LISTAGG NTH_VALUE PIVOT/UNPIVOT clause Pattern Matching Top-N
  8. 8. Introduction Oracle 12c database supports SQL Pattern Matching with the new clause - MATCH_RECOGNIZE pattern matching in a sequences of rows nothing to do with string patterns (PL/SQL REGEXP_... functions) it‘s a clause, not a function after the table name in FROM clause patterns are expressed with regular expression syntax over pattern variables pattern variables are defined as SQL expressions 19.11.2015 12c SQL Pattern Matching – wann werde ich das benutzen?8
  9. 9. Introduction 19.11.2015 12c SQL Pattern Matching – wann werde ich das benutzen?9 MATCH_RECOGNIZE ( [ PARTITION BY <cols> ] [ ORDER BY <cols> ] [ MEASURES <cols> ] [ ONE ROW PER MATCH | ALL ROWS PER MATCH ] [ SKIP_TO <option> ] PATTERN ( <row pattern> ) [ SUBSET <subset list> ] DEFINE <definition list> )
  10. 10. Introduction Example: Find Mappings in the ETL logging table, which were increasingly faster over a period of four days. Output: start and end dates of the period, elapsed time at the beginning and the end of the period, average elapsed time. 19.11.2015 12c SQL Pattern Matching – wann werde ich das benutzen?10
  11. 11. Introduction SELECT etl_date, mapping_name, elapsed FROM dwh_etl_runs; ... 04-NOV-14 MAP_STG_S_ORDER_ITEM +000000 00:14:54.42738 05-NOV-14 MAP_STG_S_ORDER +000000 00:10:13.44989 05-NOV-14 MAP_STG_S_ORDER_ITEM +000000 00:15:06.24587 05-NOV-14 MAP_STG_S_ASSET +000000 00:14:15.22855 06-NOV-14 MAP_STG_S_ASSET +000000 00:14:00.49513 06-NOV-14 MAP_STG_S_ORDER +000000 00:11:05.07337 06-NOV-14 MAP_STG_S_ORDER_ITEM +000000 00:10:12.67410 07-NOV-14 MAP_STG_S_ORDER_ITEM +000000 00:19:29.64314 07-NOV-14 MAP_STG_S_ORDER +000000 00:14:59.80953 07-NOV-14 MAP_STG_S_ASSET +000000 00:13:33.80789 08-NOV-14 MAP_STG_S_ASSET +000000 00:10:14.65652 08-NOV-14 MAP_STG_S_ORDER +000000 00:13:30.77744 08-NOV-14 MAP_STG_S_ORDER_ITEM +000000 00:17:15.11789 ... 19.11.2015 12c SQL Pattern Matching – wann werde ich das benutzen?11
  12. 12. Introduction 12c SQL Pattern Matching – wann werde ich das benutzen?12 SELECT * FROM dwh_etl_runs MATCH_RECOGNIZE ( PARTITION BY mapping_name ORDER BY etl_date MEASURES FIRST (etl_date) AS start_date , LAST (etl_date) AS end_date , FIRST (elapsed) AS first_elapsed , LAST (elapsed) AS last_elapsed , AVG(elapsed) AS avg_elapsed PATTERN (STRT DOWN{3}) DEFINE DOWN AS elapsed < PREV(elapsed) ) As for analytic functions: partition and order Define measures, which are accessible in the main query Define search pattern with regular expression over boolean pattern variables Define pattern variables Navigation operators: ▪ PREV, NEXT – physical offset ▪ FIRST, LAST – logical offset 19.11.2015
  13. 13. Introduction 12c SQL Pattern Matching – wann werde ich das benutzen?13 PATTERN: Subset of Perl syntax for regular expressions – * — 0 or more iterations – + — 1 or more iterations – ? — 0 or 1 iterations – {n} — n iterations (n > 0) – {n,} — n or more iterations (n >= 0) – {n,m} — between n and m (inclusive) iterations (0 <= n <= m, 0 < m) – {,m} — between 0 and m (inclusive) iterations (m > 0) – ( ) – Grouping – | – Alternation – {- … -} – Exclusion – ^ - before the first row in the Partition – $ - after the last row in the partition – ? – “reluctant” vs. “greedy” – …. 19.11.2015
  14. 14. Introduction 12c SQL Pattern Matching – wann werde ich das benutzen?14 Patterns are everywhere Financial Telcos Retail Traffic Automotive Transport / Logistics Fraud Detection Quality of Service Trouble Ticketing Price Trends Buying Patterns Stock Market Money Laundering Sensor Data Network Activity Advertising Campaigns Sessionization Frequent Flyer Programms Process Chain CRM 19.11.2015
  15. 15. Introduction 12c SQL Pattern Matching – wann werde ich das benutzen?15 SQL had no efficient way to handle such questions pre 12c solutions self-joins, subqueries (NOT) IN, (NOT) EXISTS switch to PL/SQL - „Do it yourself“, often multiple SQL queries transfer some logic to pipelined functions and integrate them in the main query analytic (window) functions – ORA-30483: window functions are not allowed here – not possible to use in WHERE clause – not possible to nest them – unable to access the output of analytic functions in other rows – often leads to nesting queries, self-joins, etc. 19.11.2015
  16. 16. Agenda 12c SQL Pattern Matching – wann werde ich das benutzen?16 19.11.2015 1. Introduction 2. Find consecutive ranges and gaps 3. Trouble Ticket roundtrip 4. Grouping on fuzzy criteria 5. Merge temporal intervals
  17. 17. 12c SQL Pattern Matching – wann werde ich das benutzen?17 19.11.2015 Find consecutive ranges and gaps
  18. 18. Find Consecutive Ranges / Gaps 12c SQL Pattern Matching – wann werde ich das benutzen?18 SLA, QoS: find the longest period without outage Table T_GAPS Find consecutive ranges in the values of column ID Output: Start- and End-ID of consecutive range ID 1 2 3 5 6 10 11 12 14 20 21 … mr_consecutive.sql Start of Range End of Range 1 3 5 6 10 12 19.11.2015
  19. 19. Find Consecutive Ranges / Gaps 12c SQL Pattern Matching – wann werde ich das benutzen?19 Pre 12c solution using analytic functionsID 1 2 3 5 6 10 11 12 14 20 21 … WITH groups_marked AS ( SELECT id , CASE WHEN id != LAG(id,1,id) OVER(ORDER BY id) + 1 THEN 1 ELSE 0 END new_grp FROM t_gaps) , sum_grp AS ( SELECT id, SUM(new_grp) OVER(ORDER BY id) grp_sum FROM groups_marked ) SELECT MIN(id) start_of_range , MAX(id) end_of_range FROM sum_grp GROUP BY grp_sum ORDER BY grp_sum; mr_consecutive.sql 19.11.2015
  20. 20. Find Consecutive Ranges / Gaps 12c SQL Pattern Matching – wann werde ich das benutzen?20 „Tabibitosan“- method* * - https://community.oracle.com/message/3991177#3991177 ID 1 2 3 5 6 10 11 12 14 20 21 … SELECT MIN(id) start_of_range , MAX(id) end_of_range FROM (SELECT id , id - ROW_NUMBER() OVER(ORDER BY id) distance FROM t_gaps) GROUP BY distance ORDER BY distance; mr_consecutive.sql 19.11.2015
  21. 21. Find Consecutive Ranges / Gaps 12c SQL Pattern Matching – wann werde ich das benutzen?21 12c solution with MATCH_RECOGINZEID 1 2 3 5 6 10 11 12 14 20 21 … SELECT * FROM t_gaps MATCH_RECOGNIZE ( ORDER BY id MEASURES FIRST(id) start_of_range , LAST(id) end_of_range , COUNT(*) cnt ONE ROW PER MATCH PATTERN (strt cont*) DEFINE cont AS id = PREV(id)+1 ); mr_consecutive.sql 19.11.2015
  22. 22. Find Consecutive Ranges / Gaps 12c SQL Pattern Matching – wann werde ich das benutzen?22 Table T_GAPS, numeric column ID with gaps Find the gaps in the values of column ID Output: start- and end-ID of the gap ID 1 2 3 5 6 10 11 12 14 20 21 … mr_gaps.sql Start of Gap End of Gap 4 4 7 9 13 13 15 19 19.11.2015
  23. 23. Find Consecutive Ranges / Gaps 12c SQL Pattern Matching – wann werde ich das benutzen?23 Solution with analytic functions „Tabibitosan“-method* * - https://community.oracle.com/message/3991177#3991177 ID 1 2 3 5 6 10 11 12 14 20 21 … mr_gaps.sql SELECT start_of_gap, end_of_gap FROM ( SELECT id + 1 start_of_gap , LEAD(id) OVER(ORDER BY id) - 1 end_of_gap , CASE WHEN id + 1 != LEAD(id) OVER(ORDER BY id) THEN 1 ELSE 0 END is_gap FROM t_gaps) WHERE is_gap = 1; SELECT MAX(id) + 1 start_of_gap , LEAD(MIN(id)) OVER (ORDER BY distance) -1 end_of_gap FROM (SELECT id , id - ROW_NUMBER() OVER(ORDER BY id) distance FROM t_gaps) GROUP BY distance; 19.11.2015
  24. 24. Find Consecutive Ranges / Gaps 12c SQL Pattern Matching – wann werde ich das benutzen?24 12c solution with MATCH_RECOGINZEID 1 2 3 5 6 10 11 12 14 20 21 … mr_gaps.sql SELECT * FROM t_gaps MATCH_RECOGNIZE ( ORDER BY id MEASURES PREV(gap.id)+1 start_of_gap , gap.id - 1 end_of_gap ONE ROW PER MATCH PATTERN (strt gap+) DEFINE gap AS id != PREV(id)+1 ); 19.11.2015
  25. 25. Agenda 12c SQL Pattern Matching – wann werde ich das benutzen?25 19.11.2015 1. Introduction 2. Find consecutive ranges and gaps 3. Trouble Ticket roundtrip 4. Grouping on fuzzy criteria 5. Merge temporal intervals
  26. 26. 12c SQL Pattern Matching – wann werde ich das benutzen?26 19.11.2015 Trouble Ticket roundtrip
  27. 27. Trouble Ticket Roundtrip 12c SQL Pattern Matching – wann werde ich das benutzen?27 SCOTT ADAMS KING ID Assignee Datum 1 SCOTT 01.02.2015 1 SCOTT 02.02.2015 1 ADAMS 03.02.2015 1 SCOTT 04.02.2015 2 ADAMS 01.02.2015 2 ADAMS 02.02.2015 2 SCOTT 03.02.2015 3 KING 01.02.2015 3 ADAMS 02.02.2015 3 ADAMS 03.02.2015 3 KING 04.02.2015 3 ADAMS 05.02.2015 4 KING 01.02.2015 4 ADAMS 02.02.2015 4 SCOTT 03.02.2015 4 KING 05.02.2015 ▪ Find the tickets, which went again to the same assignee 19.11.2015
  28. 28. Trouble Ticket Roundtrip 12c SQL Pattern Matching – wann werde ich das benutzen?28 Pre12c solution using self-joins mr_trouble_ticket.sql SELECT DISTINCT t1.ticket_id , t1.assignee AS first_assignee , t3.change_date AS last_change FROM trouble_ticket t1 , trouble_ticket t2 , trouble_ticket t3 WHERE t1.ticket_id = t2.ticket_id AND t1.assignee != t2.assignee AND t2.change_date > t1.change_date AND t3.assignee = t1.assignee AND t3.ticket_id = t1.ticket_id AND t3.change_date > t2.change_date ORDER BY ticket_id 19.11.2015
  29. 29. Trouble Ticket Roundtrip 12c SQL Pattern Matching – wann werde ich das benutzen?29 12c solution using MATCH_RECOGINZE clause New: – Row Pattern Skip To: where to start over after match? – match overlaping patterns mr_trouble_ticket.sql SELECT * FROM trouble_ticket MATCH_RECOGNIZE( PARTITION BY ticket_id ORDER BY change_date MEASURES strt.assignee as first_assignee , LAST(same.change_date) as letzte_bearbeitung AFTER MATCH SKIP TO FIRST another PATTERN (strt another+ same+) DEFINE same AS same.assignee = strt.assignee, another AS another.assignee != strt.assignee ); Where to start over after a match is found? 19.11.2015
  30. 30. Agenda 12c SQL Pattern Matching – wann werde ich das benutzen?30 19.11.2015 1. Introduction 2. Find consecutive ranges and gaps 3. Trouble Ticket roundtrip 4. Grouping on fuzzy criteria 5. Merge temporal intervals
  31. 31. 12c SQL Pattern Matching – wann werde ich das benutzen?31 19.11.2015 Grouping on fuzzy criteria
  32. 32. Grouping over fuzzy criteria 12c SQL Pattern Matching – wann werde ich das benutzen?32 „Sessionization“ – Group rows together where the gap between the timestamps is less than defined ... PATTERN (STRT SESS+) DEFINE SESS AS SESS.ins_date – PREV(SESS.ins_date)<= 10/24/60 – Group rows together that are within a defined interval relatively to the first row, otherwise start next group https://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID :13946369553642#3478381500346951056 ... PATTERN (A+) DEFINE A AS ins_date < FIRST(ins_date) + 6/24 Group over running totals – Split the data into the groups of defined capacity 19.11.2015
  33. 33. Grouping over fuzzy criteria 12c SQL Pattern Matching – wann werde ich das benutzen?33 Example-Schema SH (Sales History) Task: split the data into the group of fixed capacity ▪ Fit all customers ordered by age into groups providing that total sales in every group < 200 000$ 19.11.2015
  34. 34. Grouping over fuzzy criteria 12c SQL Pattern Matching – wann werde ich das benutzen?34 12c solution with MATCH_RECOGINZE clause mr_group_running_total.sql WITH q AS (SELECT c.cust_id, c.cust_year_of_birth , SUM(s.amount_sold) cust_amount_sold FROM customers c JOIN sales s ON s.cust_id = c.cust_id GROUP BY c.cust_id, c.cust_year_of_birth ) SELECT * FROM q MATCH_RECOGNIZE( ORDER BY cust_year_of_birth MEASURES MATCH_NUMBER() gruppe , SUM(cust_amount_sold) running_sum , FINAL SUM(cust_amount_sold) final_sum ALL ROWS PER MATCH PATTERN (gr*) DEFINE gr AS SUM(cust_amount_sold)<=200000 ); We need all matches Aggregate function in pattern variable‘s condition function returns the macth number Aggregates in MEASURES: Running vs. Final 19.11.2015
  35. 35. Agenda 12c SQL Pattern Matching – wann werde ich das benutzen?35 19.11.2015 1. Introduction 2. Find consecutive ranges and gaps 3. Trouble Ticket roundtrip 4. Grouping on fuzzy criteria 5. Merge temporal intervals
  36. 36. 12c SQL Pattern Matching – wann werde ich das benutzen?36 19.11.2015 Merge temporal intervals
  37. 37. Merge temporal intervals 12c SQL Pattern Matching – wann werde ich das benutzen?37 Temporal version of SCOTT-Schema: the data in EMP, DEPT and JOB have temporal validity (VALID_FROM - VALID_TO) 19.11.2015
  38. 38. Merge temporal intervals 12c SQL Pattern Matching – wann werde ich das benutzen?38 Task: Query the data for one employee joining four tables with respect of temporal validity: 19.11.2015
  39. 39. Merge temporal intervals 12c SQL Pattern Matching – wann werde ich das benutzen?39 WITH joined AS ( SELECT e.empno, g.valid_from, LEAST( e.valid_to, d.valid_to, j.valid_to, NVL(m.valid_to, e.valid_to), LEAD(g.valid_from - 1, 1, e.valid_to) OVER( PARTITION BY e.empno ORDER BY g.valid_from ) ) AS valid_to, e.ename, j.job, e.mgr, m.ename AS mgr_ename, e.hiredate, e.sal, e.comm, e.deptno, d.dname FROM empv e INNER JOIN (SELECT valid_from FROM empv UNION SELECT valid_from FROM deptv UNION SELECT valid_from FROM jobv UNION SELECT valid_to + 1 FROM empv WHERE valid_to != DATE '9999-12-31' UNION SELECT valid_to + 1 FROM deptv WHERE valid_to != DATE '9999-12-31' UNION SELECT valid_to + 1 FROM jobv WHERE valid_to != DATE '9999-12-31') g ON g.valid_from BETWEEN e.valid_from AND e.valid_to INNER JOIN deptv d ON d.deptno = e.deptno AND g.valid_from BETWEEN d.valid_from AND d.valid_to INNER JOIN jobv j ON j.jobno = e.jobno AND g.valid_from BETWEEN j.valid_from AND j.valid_to LEFT JOIN empv m ON m.empno = e.mgr AND g.valid_from BETWEEN m.valid_from AND m.valid_to ) ... Quelle: Philipp Salvisberg: http://www.salvis.com/blog/2012/12/28/joining-temporal-intervals-part-2/ 19.11.2015
  40. 40. Merge temporal intervals 12c SQL Pattern Matching – wann werde ich das benutzen?40 ... SELECT empno, valid_from, valid_to, ename, job, mgr, mgr_ename, hiredate, sal, comm, deptno, dname FROM joined MATCH_RECOGNIZE ( PARTITION BY empno, ename, job, mgr, mgr_ename, hiredate, sal, comm, deptno, dname ORDER BY valid_from MEASURES FIRST(valid_from) valid_from, LAST(valid_to) valid_to PATTERN ( strt nxt* ) DEFINE nxt as valid_from = prev(valid_to) + 1 ) WHERE empno = 7788; 19.11.2015
  41. 41. Conclusion 12c SQL Pattern Matching – wann werde ich das benutzen?41 Very powerful feature Significantly simplifies a lot of queries (self-joins, semi-, anti-joins, nested queries), mostly with performance benefit Since 2007 a proposal for ANSI-SQL Requires thinking in patterns Complicated syntax (at first sight ) But in many cases the code looks like the requirement in „plain English“ 19.11.2015
  42. 42. Further information... 12c SQL Pattern Matching – wann werde ich das benutzen?42 Database Data Warehousing Guide - SQL for Pattern Matching - http://docs.oracle.com/database/121/DWHSG/pattern.htm#DWHSG8956 Stewart Ashton‘s Blog - https://stewashton.wordpress.com Oracle Whitepaper - Patterns everywhere - Find them Fast! - http://www.oracle.com/ocom/groups/public/@otn/documents/webcontent/1965433.pdf 19.11.2015
  43. 43. 12c SQL Pattern Matching – wann werde ich das benutzen?43 19.11.2015 Trivadis an der DOAG 2015 Ebene 3 - gleich neben der Rolltreppe Wir freuen uns auf Ihren Besuch. Denn mit Trivadis gewinnen Sie immer.

×