Presentation of Common Table Expressions (CTE), recursive or not , a new feature in MySQL 8.0; slides written by Guilhem Bichot, developer of the feature, and presented by him at the Percona Live Conference in Dublin on 2017-09-26.
4 CTEs, all with a single column called txt
Cte1: One row: This
Cte2: One row: This is a
Cte3: Three rows
Cte4:
<numéro>
4 CTEs, all with a single column called txt
Cte1: One row: This
Cte2: One row: This is a
Cte3: Three rows
Cte4:
<numéro>
notes
<numéro>
0:40/8:20
We will look closer at one of the queries that benefits from DS-MRR.
This query finds the suppliers that provides most revenue.
The query uses a view to find the revenue from each supplier in a given quarter.
And based on that he finds the ones with the maximum revenue.
We have an index on the shipdate column. Hence, index range scan will be used here.
View which has a range criteria applicable for MRR
View used both in join and sub-query
<numéro>
0:20/8:40
In other words, we will materialize the view twice using a range scan on the Lineitem table before we join the rows, matching the subquery, with the supplier table.
<numéro>
Let us start with data access for single-table queries.
Ref access is an access method for reading all rows with a given key value using an index. In otherwords, Index look-up
The top example shows a query where we are looking up a customer by the customer key which is the primary key of the customer table.
EXPLAIN shows that the access type is const. This means that we will be reading from a unique index, and max one row will be returned.
It is called const because such look-ups will be done during the optimization of the query. Hence, the column values will be constant during execution.
The possible_keys column of EXPLAIN shows the candidate indexes, and the key column show which index is actually selected.
In this case, the primary index is the only candidate.
In the bottom example, we want to find all orders for a given date.
EXPLAIN shows that the access type is ref. This means that we will be reading from a non-unique index or a prefix of an index
Hence, multiple rows may be returned. The rows columns shows that the estimated number of rows is 6272 in this case.
<numéro>
0:40/8:20
We will look closer at one of the queries that benefits from DS-MRR.
This query finds the suppliers that provides most revenue.
The query uses a view to find the revenue from each supplier in a given quarter.
And based on that he finds the ones with the maximum revenue.
We have an index on the shipdate column. Hence, index range scan will be used here.
View which has a range criteria applicable for MRR
View used both in join and sub-query
<numéro>
Let us start with data access for single-table queries.
Ref access is an access method for reading all rows with a given key value using an index. In otherwords, Index look-up
The top example shows a query where we are looking up a customer by the customer key which is the primary key of the customer table.
EXPLAIN shows that the access type is const. This means that we will be reading from a unique index, and max one row will be returned.
It is called const because such look-ups will be done during the optimization of the query. Hence, the column values will be constant during execution.
The possible_keys column of EXPLAIN shows the candidate indexes, and the key column show which index is actually selected.
In this case, the primary index is the only candidate.
In the bottom example, we want to find all orders for a given date.
EXPLAIN shows that the access type is ref. This means that we will be reading from a non-unique index or a prefix of an index
Hence, multiple rows may be returned. The rows columns shows that the estimated number of rows is 6272 in this case.
<numéro>
0:20/8:40
In other words, we will materialize the view twice using a range scan on the Lineitem table before we join the rows, matching the subquery, with the supplier table.
<numéro>
<numéro>
If non-recursive CTE means ease of use and better performance, recursive CTE actually allows developers to easily achieve something is not easily achievable by other means.
The parenthesized "(SELECT ... UNION ALL SELECT ...)" has two query blocks. The first is the "non-recursive" (also known as "anchor" or "seed" in some other products), which is what one "starts with".
The second is the "recursive" one, which explains how the DBMS navigates from the seed's result. The "recursive" one references the CTE. The "non-recursive" doesn't.
Execution does like this: - non-recursive query block is evaluated, result goes into internal tmp table "cte" - if no rows, exit
- (A): recursive query block is evaluated over cte's lastly produced rows, and it produces new rows which go into "cte" (appended to previous rows: UNION ALL) - if the last step didn't produce new rows, exit - goto (A)
This algorithm is good to traverse a hierarchy: the non-recursive query block returns the top parent, recursive query block's first run returns children of top parent, second run returns children of those children, etc, until no more offspring.
<numéro>
If non-recursive CTE means ease of use and better performance, recursive CTE actually allows developers to easily achieve something is not easily achievable by other means.
The parenthesized "(SELECT ... UNION ALL SELECT ...)" has two query blocks. The first is the "non-recursive" (also known as "anchor" or "seed" in some other products), which is what one "starts with".
The second is the "recursive" one, which explains how the DBMS navigates from the seed's result. The "recursive" one references the CTE. The "non-recursive" doesn't.
Execution does like this: - non-recursive query block is evaluated, result goes into internal tmp table "cte" - if no rows, exit
- (A): recursive query block is evaluated over cte's lastly produced rows, and it produces new rows which go into "cte" (appended to previous rows: UNION ALL) - if the last step didn't produce new rows, exit - goto (A)
This algorithm is good to traverse a hierarchy: the non-recursive query block returns the top parent, recursive query block's first run returns children of top parent, second run returns children of those children, etc, until no more offspring.
<numéro>
<numéro>
If non-recursive CTE means ease of use and better performance, recursive CTE actually allows developers to easily achieve something is not easily achievable by other means.
The parenthesized "(SELECT ... UNION ALL SELECT ...)" has two query blocks. The first is the "non-recursive" (also known as "anchor" or "seed" in some other products), which is what one "starts with".
The second is the "recursive" one, which explains how the DBMS navigates from the seed's result. The "recursive" one references the CTE. The "non-recursive" doesn't.
Execution does like this: - non-recursive query block is evaluated, result goes into internal tmp table "cte" - if no rows, exit
- (A): recursive query block is evaluated over cte's lastly produced rows, and it produces new rows which go into "cte" (appended to previous rows: UNION ALL) - if the last step didn't produce new rows, exit - goto (A)
This algorithm is good to traverse a hierarchy: the non-recursive query block returns the top parent, recursive query block's first run returns children of top parent, second run returns children of those children, etc, until no more offspring.
<numéro>
If non-recursive CTE means ease of use and better performance, recursive CTE actually allows developers to easily achieve something is not easily achievable by other means.
The parenthesized "(SELECT ... UNION ALL SELECT ...)" has two query blocks. The first is the "non-recursive" (also known as "anchor" or "seed" in some other products), which is what one "starts with".
The second is the "recursive" one, which explains how the DBMS navigates from the seed's result. The "recursive" one references the CTE. The "non-recursive" doesn't.
Execution does like this: - non-recursive query block is evaluated, result goes into internal tmp table "cte" - if no rows, exit
- (A): recursive query block is evaluated over cte's lastly produced rows, and it produces new rows which go into "cte" (appended to previous rows: UNION ALL) - if the last step didn't produce new rows, exit - goto (A)
This algorithm is good to traverse a hierarchy: the non-recursive query block returns the top parent, recursive query block's first run returns children of top parent, second run returns children of those children, etc, until no more offspring.
<numéro>
Let us start with a simple example, print numbers from 1 to 10
with recursive cte as
( select 1 as a
union all
select 1+a from cte where a<10)
select * from cte; prints 1 to 10.
<numéro>
Let us start with a simple example, print numbers from 1 to 10
with recursive cte as
( select 1 as a
union all
select 1+a from cte where a<10)
select * from cte; prints 1 to 10.
<numéro>
Note that this restriction is only for the recursive select, not the non-recursive/seed/anchor SELECT
Of course, it can additionally reference other tables than the CTE and join them with the CTE.
<numéro>
Note that this restriction is only for the recursive select, not the non-recursive/seed/anchor SELECT
Of course, it can additionally reference other tables than the CTE and join them with the CTE.
<numéro>
Note that this restriction is only for the recursive select, not the non-recursive/seed/anchor SELECT
Of course, it can additionally reference other tables than the CTE and join them with the CTE.
<numéro>
If non-recursive CTE means ease of use and better performance, recursive CTE actually allows developers to easily achieve something is not easily achievable by other means.
The parenthesized "(SELECT ... UNION ALL SELECT ...)" has two query blocks. The first is the "non-recursive" (also known as "anchor" or "seed" in some other products), which is what one "starts with".
The second is the "recursive" one, which explains how the DBMS navigates from the seed's result. The "recursive" one references the CTE. The "non-recursive" doesn't.
Execution does like this: - non-recursive query block is evaluated, result goes into internal tmp table "cte" - if no rows, exit
- (A): recursive query block is evaluated over cte's lastly produced rows, and it produces new rows which go into "cte" (appended to previous rows: UNION ALL) - if the last step didn't produce new rows, exit - goto (A)
This algorithm is good to traverse a hierarchy: the non-recursive query block returns the top parent, recursive query block's first run returns children of top parent, second run returns children of those children, etc, until no more offspring.
<numéro>
Let us start with a simple example, print numbers from 1 to 10
with recursive cte as
( select 1 as a
union all
select 1+a from cte where a<10)
select * from cte; prints 1 to 10.
<numéro>
Let us start with a simple example, print numbers from 1 to 10
with recursive cte as
( select 1 as a
union all
select 1+a from cte where a<10)
select * from cte; prints 1 to 10.
<numéro>
CREATE TABLE EMPLOYEES (
ID INT PRIMARY KEY,
NAME VARCHAR(100),
MANAGER_ID INT,
INDEX (MANAGER_ID),
FOREIGN KEY (MANAGER_ID) REFERENCES EMPLOYEES(ID) );
INSERT INTO EMPLOYEES VALUES
(333, "Yasmina", NULL), # Yasmina is the CEO (as her manager_id is NULL)
(198, "John", 333), # John is 198 and reports to 333 (=Yasmina)
(692, "Tarek", 333),
(29, "Pedro", 198),
(4610, "Sarah", 29),
(72, "Pierre", 29),
(123, "Adil", 692);
# Give me the org chart with the management chain for every employee (= the path from CEO to employee):
WITH RECURSIVE EMPLOYEES_EXTENDED (ID, NAME, PATH) AS (
SELECT ID, NAME, CAST(ID AS CHAR(200))
FROM EMPLOYEES
WHERE MANAGER_ID IS NULL
UNION ALL
SELECT S.ID, S.NAME, CONCAT(M.PATH, ",", S.ID)
FROM EMPLOYEES_EXTENDED M JOIN EMPLOYEES S ON M.ID=S.MANAGER_ID )
SELECT * FROM EMPLOYEES_EXTENDED ORDER BY PATH;
ID NAME PATH
Yasmina 333
198 John 333,198
29 Pedro 333,198,29
4610 Sarah 333,198,29,4610 # Sarah->Pedro->John->Yasmina
72 Pierre 333,198,29,72
692 Tarek 333,692
123 Adil 333,692,123
<numéro>
CREATE TABLE EMPLOYEES (
ID INT PRIMARY KEY,
NAME VARCHAR(100),
MANAGER_ID INT,
INDEX (MANAGER_ID),
FOREIGN KEY (MANAGER_ID) REFERENCES EMPLOYEES(ID) );
INSERT INTO EMPLOYEES VALUES
(333, "Yasmina", NULL), # Yasmina is the CEO (as her manager_id is NULL)
(198, "John", 333), # John is 198 and reports to 333 (=Yasmina)
(692, "Tarek", 333),
(29, "Pedro", 198),
(4610, "Sarah", 29),
(72, "Pierre", 29),
(123, "Adil", 692);
# Give me the org chart with the management chain for every employee (= the path from CEO to employee):
WITH RECURSIVE EMPLOYEES_EXTENDED (ID, NAME, PATH) AS (
SELECT ID, NAME, CAST(ID AS CHAR(200))
FROM EMPLOYEES
WHERE MANAGER_ID IS NULL
UNION ALL
SELECT S.ID, S.NAME, CONCAT(M.PATH, ",", S.ID)
FROM EMPLOYEES_EXTENDED M JOIN EMPLOYEES S ON M.ID=S.MANAGER_ID )
SELECT * FROM EMPLOYEES_EXTENDED ORDER BY PATH;
ID NAME PATH
Yasmina 333
198 John 333,198
29 Pedro 333,198,29
4610 Sarah 333,198,29,4610 # Sarah->Pedro->John->Yasmina
72 Pierre 333,198,29,72
692 Tarek 333,692
123 Adil 333,692,123
<numéro>
CREATE TABLE EMPLOYEES (
ID INT PRIMARY KEY,
NAME VARCHAR(100),
MANAGER_ID INT,
INDEX (MANAGER_ID),
FOREIGN KEY (MANAGER_ID) REFERENCES EMPLOYEES(ID) );
INSERT INTO EMPLOYEES VALUES
(333, "Yasmina", NULL), # Yasmina is the CEO (as her manager_id is NULL)
(198, "John", 333), # John is 198 and reports to 333 (=Yasmina)
(692, "Tarek", 333),
(29, "Pedro", 198),
(4610, "Sarah", 29),
(72, "Pierre", 29),
(123, "Adil", 692);
# Give me the org chart with the management chain for every employee (= the path from CEO to employee):
WITH RECURSIVE EMPLOYEES_EXTENDED (ID, NAME, PATH) AS (
SELECT ID, NAME, CAST(ID AS CHAR(200))
FROM EMPLOYEES
WHERE MANAGER_ID IS NULL
UNION ALL
SELECT S.ID, S.NAME, CONCAT(M.PATH, ",", S.ID)
FROM EMPLOYEES_EXTENDED M JOIN EMPLOYEES S ON M.ID=S.MANAGER_ID )
SELECT * FROM EMPLOYEES_EXTENDED ORDER BY PATH;
ID NAME PATH
Yasmina 333
198 John 333,198
29 Pedro 333,198,29
4610 Sarah 333,198,29,4610 # Sarah->Pedro->John->Yasmina
72 Pierre 333,198,29,72
692 Tarek 333,692
123 Adil 333,692,123
<numéro>
First rows defines the column types
CREATE TABLE EMPLOYEES (
ID INT PRIMARY KEY,
NAME VARCHAR(100),
MANAGER_ID INT,
INDEX (MANAGER_ID),
FOREIGN KEY (MANAGER_ID) REFERENCES EMPLOYEES(ID) );
INSERT INTO EMPLOYEES VALUES
(333, "Yasmina", NULL), # Yasmina is the CEO (as her manager_id is NULL)
(198, "John", 333), # John is 198 and reports to 333 (=Yasmina)
(692, "Tarek", 333),
(29, "Pedro", 198),
(4610, "Sarah", 29),
(72, "Pierre", 29),
(123, "Adil", 692);
# Give me the org chart with the management chain for every employee (= the path from CEO to employee):
WITH RECURSIVE EMPLOYEES_EXTENDED (ID, NAME, PATH) AS (
SELECT ID, NAME, CAST(ID AS CHAR(200))
FROM EMPLOYEES
WHERE MANAGER_ID IS NULL
UNION ALL
SELECT S.ID, S.NAME, CONCAT(M.PATH, ",", S.ID)
FROM EMPLOYEES_EXTENDED M JOIN EMPLOYEES S ON M.ID=S.MANAGER_ID )
SELECT * FROM EMPLOYEES_EXTENDED ORDER BY PATH;
ID NAME PATH
Yasmina 333
198 John 333,198
29 Pedro 333,198,29
4610 Sarah 333,198,29,4610 # Sarah->Pedro->John->Yasmina
72 Pierre 333,198,29,72
692 Tarek 333,692
123 Adil 333,692,123
<numéro>
Note: First row determines type
CREATE TABLE EMPLOYEES (
ID INT PRIMARY KEY,
NAME VARCHAR(100),
MANAGER_ID INT,
INDEX (MANAGER_ID),
FOREIGN KEY (MANAGER_ID) REFERENCES EMPLOYEES(ID) );
INSERT INTO EMPLOYEES VALUES
(333, "Yasmina", NULL), # Yasmina is the CEO (as her manager_id is NULL)
(198, "John", 333), # John is 198 and reports to 333 (=Yasmina)
(692, "Tarek", 333),
(29, "Pedro", 198),
(4610, "Sarah", 29),
(72, "Pierre", 29),
(123, "Adil", 692);
# Give me the org chart with the management chain for every employee (= the path from CEO to employee):
WITH RECURSIVE EMPLOYEES_EXTENDED (ID, NAME, PATH) AS (
SELECT ID, NAME, CAST(ID AS CHAR(200))
FROM EMPLOYEES
WHERE MANAGER_ID IS NULL
UNION ALL
SELECT S.ID, S.NAME, CONCAT(M.PATH, ",", S.ID)
FROM EMPLOYEES_EXTENDED M JOIN EMPLOYEES S ON M.ID=S.MANAGER_ID )
SELECT * FROM EMPLOYEES_EXTENDED ORDER BY PATH;
ID NAME PATH
Yasmina 333
198 John 333,198
29 Pedro 333,198,29
4610 Sarah 333,198,29,4610 # Sarah->Pedro->John->Yasmina
72 Pierre 333,198,29,72
692 Tarek 333,692
123 Adil 333,692,123
<numéro>
Note: First row determines type
CREATE TABLE EMPLOYEES (
ID INT PRIMARY KEY,
NAME VARCHAR(100),
MANAGER_ID INT,
INDEX (MANAGER_ID),
FOREIGN KEY (MANAGER_ID) REFERENCES EMPLOYEES(ID) );
INSERT INTO EMPLOYEES VALUES
(333, "Yasmina", NULL), # Yasmina is the CEO (as her manager_id is NULL)
(198, "John", 333), # John is 198 and reports to 333 (=Yasmina)
(692, "Tarek", 333),
(29, "Pedro", 198),
(4610, "Sarah", 29),
(72, "Pierre", 29),
(123, "Adil", 692);
# Give me the org chart with the management chain for every employee (= the path from CEO to employee):
WITH RECURSIVE EMPLOYEES_EXTENDED (ID, NAME, PATH) AS (
SELECT ID, NAME, CAST(ID AS CHAR(200))
FROM EMPLOYEES
WHERE MANAGER_ID IS NULL
UNION ALL
SELECT S.ID, S.NAME, CONCAT(M.PATH, ",", S.ID)
FROM EMPLOYEES_EXTENDED M JOIN EMPLOYEES S ON M.ID=S.MANAGER_ID )
SELECT * FROM EMPLOYEES_EXTENDED ORDER BY PATH;
ID NAME PATH
Yasmina 333
198 John 333,198
29 Pedro 333,198,29
4610 Sarah 333,198,29,4610 # Sarah->Pedro->John->Yasmina
72 Pierre 333,198,29,72
692 Tarek 333,692
123 Adil 333,692,123
<numéro>
DISTINCT is not included in the 8.0.0 labs release. Can be used to compute transitive closures when there are cycles since existing rows will not be added, and recursion will terminate when no more rows are added.
<numéro>
Note that this restriction is only for the recursive select, not the non-recursive/seed/anchor SELECT
Of course, it can additionally reference other tables than the CTE and join them with the CTE.
<numéro>
Note that this restriction is only for the recursive select, not the non-recursive/seed/anchor SELECT
Of course, it can additionally reference other tables than the CTE and join them with the CTE.
<numéro>
That concludes this presentation.
For more information, check my blog, the optimizer team blog and the MySQL server team blog.
There are also MySQL forums where you can ask questions related to Optimization of MySQL queries.
We monitor these forums and will try to answer the question you may have.
<numéro>
This is a Safe Harbor Front slide, one of two Safe Harbor Statement slides included in this template.
One of the Safe Harbor slides must be used if your presentation covers material affected by Oracle’s Revenue Recognition Policy
To learn more about this policy, e-mail: Revrec-americasiebc_us@oracle.com
For internal communication, Safe Harbor Statements are not required. However, there is an applicable disclaimer (Exhibit E) that should be used, found in the Oracle Revenue Recognition Policy for Future Product Communications. Copy and paste this link into a web browser, to find out more information.
http://my.oracle.com/site/fin/gfo/GlobalProcesses/cnt452504.pdf
For all external communications such as press release, roadmaps, PowerPoint presentations, Safe Harbor Statements are required. You can refer to the link mentioned above to find out additional information/disclaimers required depending on your audience.
<numéro>