SlideShare a Scribd company logo
1 of 35
Download to read offline
Playing with CONNECT
Federico Razzoli
$ whoami
Hi, I’m Federico Razzoli from Vettabase Ltd
Database consultant, open source supporter,
long time MariaDB and MySQL user
● vettabase.com
● Federico-Razzoli.com
What is CONNECT?
What is a Storage Engine?
● MariaDB knows nothing about…
○ Writing / reading data
○ Writing / reading indexes
○ Caching data and indexes
○ Transactions
○ …
● These functionalities are implemented in special plugins called
storage engines
● InnoDB is the default storage engine
Some storage engines do strange things...
● BLACKHOLE
● SEQUENCE
● SPIDER
● CSV
What is a Storage Engine?
The list can vary depending on MariaDB version and distribution
MariaDB [(none)]> SELECT * FROM information_schema.ENGINES
WHERE ENGINE = 'CONNECT' G
*************************** 1. row ***************************
ENGINE: CONNECT
SUPPORT: YES
COMMENT: Management of External Data (SQL/NOSQL/MED), including Rest query
results
TRANSACTIONS: NO
XA: NO
SAVEPOINTS: NO
1 row in set (0.000 sec)
CONNECT
● CONNECT is designed for MED (Management of External Data)
● It connects MariaDB to data stored in another form
● It depending on the TABLE_TYPE it can do many things:
○ Use data from remote DBMSs
○ Use data from files in various formats
○ Special data sources
○ Data tranformation
File-Based Tables
Inward / Outward
CREATE TABLE file_table (
a INT
, b INT
)
ENGINE = CONNECT
, TABLE_TYPE = CSV
, FILE_NAME = 'data.csv'
;
● A file-based table can be inward or Onward
● If FILE_NAME is specified the table is Outward
● Outward tables are assumed to be “holy”
ALTER TABLE on File-Based tables
ALTER TABLE file_table DROP COLUMN a;
● If the table is Outward:
○ A column disappears from the table;
○ But the underlying file remains unchanged.
● If the table is Inward, the underlying file is modified
Inward Tables
CREATE TABLE csv_data ( ... ) ENGINE = CONNECT, TABLE_TYPE = CSV;
● The CSV file will be located in the database directory
● In this case, the file name will be csv_data.csv
● To know the exact name:
○ SHOW WARNINGS;
○ Regexp to get the filename: s(w+)$
Import + Modify + Export
● An interesting use case for CONNECT is:
○ Receive data in a certain understood format
○ Make some changes that are easier in SQL
■ SELECT column_list FROM table
■ SELECT a, AVG(b) FROM table GROUP BY a
○ Export the data in the same format
Import + Modify + Export
● This can be done:
○ Create an Outward table
○ CREATE TABLE exported_data SELECT …
○ Copy the table elsewhere and DROP it
● Or, for more complex transformations:
○ Create an Outward table
○ CREATE TABLE intermediate_data … ENGINE InnoDB;
○ Add indexes as needed
○ Make some data transformation
○ CREATE TABLE exported_data
ENGINE=CONNECT, TABLE_TYPE=CSV, SEP_CHAR='t', HEADER=1
SELECT * FROM intermediate_data
○ Copy the table elsewhere and DROP it
Exporting data
ALTER TABLE numbers
ENGINE = CONNECT
, TABLE_TYPE = CSV
, SEP_CHAR = 't'
;
● This is the most efficient way to transform a table that you don’t need anymore
● But the file will be created in MariaDB datadir, you cannot specify a different
path for Inward tables
Let’s try reading Apache logs
Sample
● A small sample of vettabase.com Apache error log
● IPs are scrambled
40.88.21.225 - - [07/Sep/2020:17:11:22 +0100] "GET / HTTP/1.1" 302 -
"http://vettabase.com/" "Mozilla/5.0 (compatible;
DuckDuckGo-Favicons-Bot/1.0; +http://duckduckgo.com)"
198.100.126.179 - - [07/Sep/2020:17:14:34 +0100] "GET /admin/ HTTP/1.1"
404 - "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0)
Gecko/20100101 Firefox/62.0"
120.26.50.46 - - [07/Sep/2020:18:33:24 +0100] "HEAD /caiyuan/login.php
HTTP/1.1" 404 - "-" "-"
120.27.51.66 - - [07/Sep/2020:18:33:24 +0100] "HEAD /guanli/login.php
HTTP/1.1" 404 - "-" "-"
120.27.51.66 - - [07/Sep/2020:18:33:24 +0100] "HEAD /admin/login.php
HTTP/1.1" 404 - "-" "-"
Mmmm...
● A precise machine-readable format is used
● But it’s a bit irregular - a bit less machine readable than CSV or JSON
● We’ll have to define a way to parse the data we need
● We only care about some columns
40.88.21.225 - - [07/Sep/2020:17:11:22 +0100] "GET / HTTP/1.1" 302 -
"http://vettabase.com/" "Mozilla/5.0 (compatible;
DuckDuckGo-Favicons-Bot/1.0; +http://duckduckgo.com)"
● ip: 40.88.21.225
● time: 07/Sep/2020:17:11:22
● timezone: +0100
● request_type: GET
● path: /
● protocol: HTTP/1.1
● http_response_code: 302
CREATE OR REPLACE TABLE web_log (
ip VARCHAR(15) NOT NULL FIELD_FORMAT = '%n%s%n - - '
, time VARCHAR(100) NOT NULL FIELD_FORMAT = '%n%s%n'
, timezone VARCHAR(50) NOT NULL FIELD_FORMAT = ' %n%s%n "'
, request_type VARCHAR(5) NOT NULL FIELD_FORMAT = '%n%s%n'
, path VARCHAR(200) NOT NULL FIELD_FORMAT = ' %n%s%n'
, protocol VARCHAR(10) NOT NULL FIELD_FORMAT = ' %n%s%n '
, http_response_code SMALLINT UNSIGNED NOT NULL FIELD_FORMAT =
'%n%d%n'
)
ENGINE = CONNECT
, TABLE_TYPE = 'FMT'
, FILE_NAME= '/var/shared/apache.log'
;
-- If you forget to specify the full path
Warning (Code 1105): Open(rb) error 2 on
/usr/local/mariadb/data/./test/apache.log: No such file or directory
-- If a FIELD_FORMAT is not correct or the file has lines in an
-- inconsistent format
ERROR 1296 (HY000): Got error 122 'Bad format line 1 field 3 of web_log'
from CONNECT
MariaDB [test]> SELECT * FROM web_log LIMIT 1 G
*************************** 1. row ***************************
ip: 114.119.159.128
time: [07/Sep/2020:13:17:13
timezone: +0100]
request_type: GET
path: /robots.txt
protocol: HTTP/1.1"
http_response_code: 404
CREATE OR REPLACE TABLE web_log (
...
, time VARCHAR(100)
GENERATED ALWAYS AS (SUBSTRING(raw_time FROM 2))
, timezone VARCHAR(5)
GENERATED ALWAYS AS (SUBSTRING(raw_timezone FROM 1 FOR
CHAR_LENGTH(raw_timezone) - 1))
, protocol VARCHAR(10)
GENERATED ALWAYS AS (SUBSTRING(raw_protocol FROM 1 FOR
CHAR_LENGTH(raw_protocol) - 1))
)
...
;
MariaDB [test]> SELECT ip, time, timezone, protocol FROM web_log LIMIT 1
G
*************************** 1. row ***************************
ip: 114.119.159.128
time: 07/Sep/2020:13:17:13
timezone: +0100
protocol: HTTP/1.1
1 row in set (0.002 sec)
Let’s do some analyses
Analyses on a log
MariaDB [test]> SELECT http_response_code, COUNT(*) FROM web_log GROUP BY
http_response_code;
+--------------------+----------+
| http_response_code | COUNT(*) |
+--------------------+----------+
| 302 | 11 |
| 404 | 61 |
+--------------------+----------+
MariaDB [test]> SELECT request_type, COUNT(*) FROM web_log GROUP BY http_response_code;
+--------------+----------+
| request_type | COUNT(*) |
+--------------+----------+
| GET | 11 |
| GET | 61 |
+--------------+----------+
PIVOTing a table
● Doing some analyses on logs is cool
● But we’d like to pivot a table, and MariaDB doesn’t support the PIVOT syntax
● But we can use CONNECT’s PIVOT table type
CONNECT user
● CONNECT tables that transform data from other tables need to establish a
connection to MariaDB and run queries
● In order to do that, they need to use an account
● It is a good practice (and default) to have a mysql@localhost account
Creating a PIVOT table
● Note that the table definition contains CONNECT’s user
● SHOW CREATE TABLE shows this info
● This is why it is best to use unix_socket authorisation plugin
CREATE OR REPLACE TABLE requests_by_response_and_type
ENGINE = CONNECT,
TABLE_TYPE = PIVOT,
TABNAME = 'web_log',
OPTION_LIST = 'user=mysql,host=localhost,
PivotCol=request_type,Function=count'
;
Reading our PIVOT table
MariaDB [test]> SELECT * FROM requests_by_response_and_type ;
+-----------------+-----------------------+--------------+------------------
---------------------------------------------------+--------------+-----+---
---+
| ip | raw_time | raw_timezone | path
| raw_protocol | GET | HEAD |
+-----------------+-----------------------+--------------+------------------
---------------------------------------------------+--------------+-----+---
---+
| 112.124.0.114 | [07/Sep/2020:16:44:25 | +0100] | /dede/login.php
| HTTP/1.1" | 0 | 1 | | 112.124.0.114 | [07/Sep/2020:16:44:25 | +0100]
| /dedea/login.php |
HTTP/1.1" | 0 | 1 |
...
Reading our PIVOT table
MariaDB [test]> SELECT request_type, `GET`, `HEAD` FROM
requests_by_response_and_type ;
ERROR 1054 (42S22): Unknown column 'request_type' in 'field list'
PIVOTing a query
OPTION_LIST = 'user=mysql,host=localhost',
SrcDef = 'SELECT request_type, COUNT(*) FROM web_log
GROUP BY request_type'
MariaDB [test]> SELECT * FROM requests_by_response_and_type ;
+-----+------+
| GET | HEAD |
+-----+------+
| 23 | 49 |
+-----+------+
Other transformations?
● OCCUR unpivots columns
● XCOL turns lists into multiple rows
● TBL allows to treat a set of tables as a single table
What did we leave out?
● Almost everything! We’ve just played a bit with an Apache log!
● Other file formats (JSON, XML, HTML tables, ini, fixed-length, …)
● Compressed files
● More magic with custom formats
● Connections via MySQL format, ODBC, JDBC, MongoDB
● Querying remote REST API’s
● ...and more
Final hints:
build proper indexes where possible
increase connect_work_size if your files are big
Thanks for attending!
Question time :-)

More Related Content

What's hot

Dbvisit replicate: logical replication made easy
Dbvisit replicate: logical replication made easyDbvisit replicate: logical replication made easy
Dbvisit replicate: logical replication made easyFranck Pachot
 
Percona xtra db cluster(pxc) non blocking operations, what you need to know t...
Percona xtra db cluster(pxc) non blocking operations, what you need to know t...Percona xtra db cluster(pxc) non blocking operations, what you need to know t...
Percona xtra db cluster(pxc) non blocking operations, what you need to know t...Marco Tusa
 
Preparse Query Rewrite Plugins
Preparse Query Rewrite PluginsPreparse Query Rewrite Plugins
Preparse Query Rewrite PluginsSveta Smirnova
 
Inno db datafiles backup and retore
Inno db datafiles backup and retoreInno db datafiles backup and retore
Inno db datafiles backup and retoreVasudeva Rao
 
12c SQL Plan Directives
12c SQL Plan Directives12c SQL Plan Directives
12c SQL Plan DirectivesFranck Pachot
 
DB2UDB_the_Basics Day 4
DB2UDB_the_Basics Day 4DB2UDB_the_Basics Day 4
DB2UDB_the_Basics Day 4Pranav Prakash
 
Performance Schema for MySQL troubleshooting
Performance Schema for MySQL troubleshootingPerformance Schema for MySQL troubleshooting
Performance Schema for MySQL troubleshootingSveta Smirnova
 
MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)
MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)
MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)Valeriy Kravchuk
 
Performance schema in_my_sql_5.6_pluk2013
Performance schema in_my_sql_5.6_pluk2013Performance schema in_my_sql_5.6_pluk2013
Performance schema in_my_sql_5.6_pluk2013Valeriy Kravchuk
 
Dynamic tracing of MariaDB on Linux - problems and solutions (MariaDB Server ...
Dynamic tracing of MariaDB on Linux - problems and solutions (MariaDB Server ...Dynamic tracing of MariaDB on Linux - problems and solutions (MariaDB Server ...
Dynamic tracing of MariaDB on Linux - problems and solutions (MariaDB Server ...Valeriy Kravchuk
 
MySQL Performance Schema in Action
MySQL Performance Schema in ActionMySQL Performance Schema in Action
MySQL Performance Schema in ActionSveta Smirnova
 
Plmce 14 be a_hero_16x9_final
Plmce 14 be a_hero_16x9_finalPlmce 14 be a_hero_16x9_final
Plmce 14 be a_hero_16x9_finalMarco Tusa
 
data loading and unloading in IBM Netezza by www.etraining.guru
data loading and unloading in IBM Netezza by www.etraining.gurudata loading and unloading in IBM Netezza by www.etraining.guru
data loading and unloading in IBM Netezza by www.etraining.guruRavikumar Nandigam
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012Roland Bouman
 
15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performance15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performanceguest9912e5
 
Introduction into MySQL Query Tuning for Dev[Op]s
Introduction into MySQL Query Tuning for Dev[Op]sIntroduction into MySQL Query Tuning for Dev[Op]s
Introduction into MySQL Query Tuning for Dev[Op]sSveta Smirnova
 
Introducing new SQL syntax and improving performance with preparse Query Rewr...
Introducing new SQL syntax and improving performance with preparse Query Rewr...Introducing new SQL syntax and improving performance with preparse Query Rewr...
Introducing new SQL syntax and improving performance with preparse Query Rewr...Sveta Smirnova
 
Mysql database basic user guide
Mysql database basic user guideMysql database basic user guide
Mysql database basic user guidePoguttuezhiniVP
 
DB2UDB_the_Basics Day 5
DB2UDB_the_Basics Day 5DB2UDB_the_Basics Day 5
DB2UDB_the_Basics Day 5Pranav Prakash
 

What's hot (20)

Dbvisit replicate: logical replication made easy
Dbvisit replicate: logical replication made easyDbvisit replicate: logical replication made easy
Dbvisit replicate: logical replication made easy
 
Percona xtra db cluster(pxc) non blocking operations, what you need to know t...
Percona xtra db cluster(pxc) non blocking operations, what you need to know t...Percona xtra db cluster(pxc) non blocking operations, what you need to know t...
Percona xtra db cluster(pxc) non blocking operations, what you need to know t...
 
Preparse Query Rewrite Plugins
Preparse Query Rewrite PluginsPreparse Query Rewrite Plugins
Preparse Query Rewrite Plugins
 
Inno db datafiles backup and retore
Inno db datafiles backup and retoreInno db datafiles backup and retore
Inno db datafiles backup and retore
 
12c SQL Plan Directives
12c SQL Plan Directives12c SQL Plan Directives
12c SQL Plan Directives
 
DB2UDB_the_Basics Day 4
DB2UDB_the_Basics Day 4DB2UDB_the_Basics Day 4
DB2UDB_the_Basics Day 4
 
Performance Schema for MySQL troubleshooting
Performance Schema for MySQL troubleshootingPerformance Schema for MySQL troubleshooting
Performance Schema for MySQL troubleshooting
 
MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)
MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)
MariaDB 10.5 new features for troubleshooting (mariadb server fest 2020)
 
Performance schema in_my_sql_5.6_pluk2013
Performance schema in_my_sql_5.6_pluk2013Performance schema in_my_sql_5.6_pluk2013
Performance schema in_my_sql_5.6_pluk2013
 
MySQLinsanity
MySQLinsanityMySQLinsanity
MySQLinsanity
 
Dynamic tracing of MariaDB on Linux - problems and solutions (MariaDB Server ...
Dynamic tracing of MariaDB on Linux - problems and solutions (MariaDB Server ...Dynamic tracing of MariaDB on Linux - problems and solutions (MariaDB Server ...
Dynamic tracing of MariaDB on Linux - problems and solutions (MariaDB Server ...
 
MySQL Performance Schema in Action
MySQL Performance Schema in ActionMySQL Performance Schema in Action
MySQL Performance Schema in Action
 
Plmce 14 be a_hero_16x9_final
Plmce 14 be a_hero_16x9_finalPlmce 14 be a_hero_16x9_final
Plmce 14 be a_hero_16x9_final
 
data loading and unloading in IBM Netezza by www.etraining.guru
data loading and unloading in IBM Netezza by www.etraining.gurudata loading and unloading in IBM Netezza by www.etraining.guru
data loading and unloading in IBM Netezza by www.etraining.guru
 
Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012
 
15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performance15 Ways to Kill Your Mysql Application Performance
15 Ways to Kill Your Mysql Application Performance
 
Introduction into MySQL Query Tuning for Dev[Op]s
Introduction into MySQL Query Tuning for Dev[Op]sIntroduction into MySQL Query Tuning for Dev[Op]s
Introduction into MySQL Query Tuning for Dev[Op]s
 
Introducing new SQL syntax and improving performance with preparse Query Rewr...
Introducing new SQL syntax and improving performance with preparse Query Rewr...Introducing new SQL syntax and improving performance with preparse Query Rewr...
Introducing new SQL syntax and improving performance with preparse Query Rewr...
 
Mysql database basic user guide
Mysql database basic user guideMysql database basic user guide
Mysql database basic user guide
 
DB2UDB_the_Basics Day 5
DB2UDB_the_Basics Day 5DB2UDB_the_Basics Day 5
DB2UDB_the_Basics Day 5
 

Similar to Playing with the CONNECT storage engine

Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012Roland Bouman
 
Schema replication using oracle golden gate 12c
Schema replication using oracle golden gate 12cSchema replication using oracle golden gate 12c
Schema replication using oracle golden gate 12cuzzal basak
 
MySQL 8.0.18 - New Features Summary
MySQL 8.0.18 - New Features SummaryMySQL 8.0.18 - New Features Summary
MySQL 8.0.18 - New Features SummaryOlivier DASINI
 
Developing Information Schema Plugins
Developing Information Schema PluginsDeveloping Information Schema Plugins
Developing Information Schema PluginsMark Leith
 
Webinar: MariaDB Provides the Solution to Ease Multi-Source Replication
Webinar: MariaDB Provides the Solution to Ease Multi-Source ReplicationWebinar: MariaDB Provides the Solution to Ease Multi-Source Replication
Webinar: MariaDB Provides the Solution to Ease Multi-Source ReplicationWagner Bianchi
 
MySQL InnoDB Cluster 미리보기 (remote cluster test)
MySQL InnoDB Cluster 미리보기 (remote cluster test)MySQL InnoDB Cluster 미리보기 (remote cluster test)
MySQL InnoDB Cluster 미리보기 (remote cluster test)Seungmin Yu
 
MySQL Audit using Percona audit plugin and ELK
MySQL Audit using Percona audit plugin and ELKMySQL Audit using Percona audit plugin and ELK
MySQL Audit using Percona audit plugin and ELKYoungHeon (Roy) Kim
 
Introduction To Lamp P2
Introduction To Lamp P2Introduction To Lamp P2
Introduction To Lamp P2Amzad Hossain
 
OSMC 2008 | Monitoring MySQL by Geert Vanderkelen
OSMC 2008 | Monitoring MySQL by Geert VanderkelenOSMC 2008 | Monitoring MySQL by Geert Vanderkelen
OSMC 2008 | Monitoring MySQL by Geert VanderkelenNETWAYS
 
MySQL User Group NL - MySQL 8
MySQL User Group NL - MySQL 8MySQL User Group NL - MySQL 8
MySQL User Group NL - MySQL 8Frederic Descamps
 
Capturing, Analyzing and Optimizing MySQL
Capturing, Analyzing and Optimizing MySQLCapturing, Analyzing and Optimizing MySQL
Capturing, Analyzing and Optimizing MySQLRonald Bradford
 
Capturing, Analyzing, and Optimizing your SQL
Capturing, Analyzing, and Optimizing your SQLCapturing, Analyzing, and Optimizing your SQL
Capturing, Analyzing, and Optimizing your SQLPadraig O'Sullivan
 
MySQL 5.7 in a Nutshell
MySQL 5.7 in a NutshellMySQL 5.7 in a Nutshell
MySQL 5.7 in a NutshellEmily Ikuta
 
Introduction to MySQL InnoDB Cluster
Introduction to MySQL InnoDB ClusterIntroduction to MySQL InnoDB Cluster
Introduction to MySQL InnoDB ClusterI Goo Lee
 
Instant add column for inno db in mariadb 10.3+ (fosdem 2018, second draft)
Instant add column for inno db in mariadb 10.3+ (fosdem 2018, second draft)Instant add column for inno db in mariadb 10.3+ (fosdem 2018, second draft)
Instant add column for inno db in mariadb 10.3+ (fosdem 2018, second draft)Valerii Kravchuk
 
Confoo 2021 -- MySQL New Features
Confoo 2021 -- MySQL New FeaturesConfoo 2021 -- MySQL New Features
Confoo 2021 -- MySQL New FeaturesDave Stokes
 
20201106 hk-py con-mysql-shell
20201106 hk-py con-mysql-shell20201106 hk-py con-mysql-shell
20201106 hk-py con-mysql-shellIvan Ma
 
Exadata - BULK DATA LOAD Testing on Database Machine
Exadata - BULK DATA LOAD Testing on Database Machine Exadata - BULK DATA LOAD Testing on Database Machine
Exadata - BULK DATA LOAD Testing on Database Machine Monowar Mukul
 
MySQL Tokudb engine benchmark
MySQL Tokudb engine benchmarkMySQL Tokudb engine benchmark
MySQL Tokudb engine benchmarkLouis liu
 

Similar to Playing with the CONNECT storage engine (20)

Common schema my sql uc 2012
Common schema   my sql uc 2012Common schema   my sql uc 2012
Common schema my sql uc 2012
 
Schema replication using oracle golden gate 12c
Schema replication using oracle golden gate 12cSchema replication using oracle golden gate 12c
Schema replication using oracle golden gate 12c
 
MySQL 8.0.18 - New Features Summary
MySQL 8.0.18 - New Features SummaryMySQL 8.0.18 - New Features Summary
MySQL 8.0.18 - New Features Summary
 
Developing Information Schema Plugins
Developing Information Schema PluginsDeveloping Information Schema Plugins
Developing Information Schema Plugins
 
Webinar: MariaDB Provides the Solution to Ease Multi-Source Replication
Webinar: MariaDB Provides the Solution to Ease Multi-Source ReplicationWebinar: MariaDB Provides the Solution to Ease Multi-Source Replication
Webinar: MariaDB Provides the Solution to Ease Multi-Source Replication
 
MySQL InnoDB Cluster 미리보기 (remote cluster test)
MySQL InnoDB Cluster 미리보기 (remote cluster test)MySQL InnoDB Cluster 미리보기 (remote cluster test)
MySQL InnoDB Cluster 미리보기 (remote cluster test)
 
MySQL Audit using Percona audit plugin and ELK
MySQL Audit using Percona audit plugin and ELKMySQL Audit using Percona audit plugin and ELK
MySQL Audit using Percona audit plugin and ELK
 
Introduction To Lamp P2
Introduction To Lamp P2Introduction To Lamp P2
Introduction To Lamp P2
 
OSMC 2008 | Monitoring MySQL by Geert Vanderkelen
OSMC 2008 | Monitoring MySQL by Geert VanderkelenOSMC 2008 | Monitoring MySQL by Geert Vanderkelen
OSMC 2008 | Monitoring MySQL by Geert Vanderkelen
 
MySQL 5.1 Replication
MySQL 5.1 ReplicationMySQL 5.1 Replication
MySQL 5.1 Replication
 
MySQL User Group NL - MySQL 8
MySQL User Group NL - MySQL 8MySQL User Group NL - MySQL 8
MySQL User Group NL - MySQL 8
 
Capturing, Analyzing and Optimizing MySQL
Capturing, Analyzing and Optimizing MySQLCapturing, Analyzing and Optimizing MySQL
Capturing, Analyzing and Optimizing MySQL
 
Capturing, Analyzing, and Optimizing your SQL
Capturing, Analyzing, and Optimizing your SQLCapturing, Analyzing, and Optimizing your SQL
Capturing, Analyzing, and Optimizing your SQL
 
MySQL 5.7 in a Nutshell
MySQL 5.7 in a NutshellMySQL 5.7 in a Nutshell
MySQL 5.7 in a Nutshell
 
Introduction to MySQL InnoDB Cluster
Introduction to MySQL InnoDB ClusterIntroduction to MySQL InnoDB Cluster
Introduction to MySQL InnoDB Cluster
 
Instant add column for inno db in mariadb 10.3+ (fosdem 2018, second draft)
Instant add column for inno db in mariadb 10.3+ (fosdem 2018, second draft)Instant add column for inno db in mariadb 10.3+ (fosdem 2018, second draft)
Instant add column for inno db in mariadb 10.3+ (fosdem 2018, second draft)
 
Confoo 2021 -- MySQL New Features
Confoo 2021 -- MySQL New FeaturesConfoo 2021 -- MySQL New Features
Confoo 2021 -- MySQL New Features
 
20201106 hk-py con-mysql-shell
20201106 hk-py con-mysql-shell20201106 hk-py con-mysql-shell
20201106 hk-py con-mysql-shell
 
Exadata - BULK DATA LOAD Testing on Database Machine
Exadata - BULK DATA LOAD Testing on Database Machine Exadata - BULK DATA LOAD Testing on Database Machine
Exadata - BULK DATA LOAD Testing on Database Machine
 
MySQL Tokudb engine benchmark
MySQL Tokudb engine benchmarkMySQL Tokudb engine benchmark
MySQL Tokudb engine benchmark
 

More from Federico Razzoli

Webinar - Unleash AI power with MySQL and MindsDB
Webinar - Unleash AI power with MySQL and MindsDBWebinar - Unleash AI power with MySQL and MindsDB
Webinar - Unleash AI power with MySQL and MindsDBFederico Razzoli
 
MariaDB Security Best Practices
MariaDB Security Best PracticesMariaDB Security Best Practices
MariaDB Security Best PracticesFederico Razzoli
 
A first look at MariaDB 11.x features and ideas on how to use them
A first look at MariaDB 11.x features and ideas on how to use themA first look at MariaDB 11.x features and ideas on how to use them
A first look at MariaDB 11.x features and ideas on how to use themFederico Razzoli
 
MariaDB stored procedures and why they should be improved
MariaDB stored procedures and why they should be improvedMariaDB stored procedures and why they should be improved
MariaDB stored procedures and why they should be improvedFederico Razzoli
 
Webinar - MariaDB Temporal Tables: a demonstration
Webinar - MariaDB Temporal Tables: a demonstrationWebinar - MariaDB Temporal Tables: a demonstration
Webinar - MariaDB Temporal Tables: a demonstrationFederico Razzoli
 
Webinar - Key Reasons to Upgrade to MySQL 8.0 or MariaDB 10.11
Webinar - Key Reasons to Upgrade to MySQL 8.0 or MariaDB 10.11Webinar - Key Reasons to Upgrade to MySQL 8.0 or MariaDB 10.11
Webinar - Key Reasons to Upgrade to MySQL 8.0 or MariaDB 10.11Federico Razzoli
 
MariaDB 10.11 key features overview for DBAs
MariaDB 10.11 key features overview for DBAsMariaDB 10.11 key features overview for DBAs
MariaDB 10.11 key features overview for DBAsFederico Razzoli
 
Recent MariaDB features to learn for a happy life
Recent MariaDB features to learn for a happy lifeRecent MariaDB features to learn for a happy life
Recent MariaDB features to learn for a happy lifeFederico Razzoli
 
Advanced MariaDB features that developers love.pdf
Advanced MariaDB features that developers love.pdfAdvanced MariaDB features that developers love.pdf
Advanced MariaDB features that developers love.pdfFederico Razzoli
 
Automate MariaDB Galera clusters deployments with Ansible
Automate MariaDB Galera clusters deployments with AnsibleAutomate MariaDB Galera clusters deployments with Ansible
Automate MariaDB Galera clusters deployments with AnsibleFederico Razzoli
 
Creating Vagrant development machines with MariaDB
Creating Vagrant development machines with MariaDBCreating Vagrant development machines with MariaDB
Creating Vagrant development machines with MariaDBFederico Razzoli
 
MariaDB, MySQL and Ansible: automating database infrastructures
MariaDB, MySQL and Ansible: automating database infrastructuresMariaDB, MySQL and Ansible: automating database infrastructures
MariaDB, MySQL and Ansible: automating database infrastructuresFederico Razzoli
 
Database Design most common pitfalls
Database Design most common pitfallsDatabase Design most common pitfalls
Database Design most common pitfallsFederico Razzoli
 
JSON in MySQL and MariaDB Databases
JSON in MySQL and MariaDB DatabasesJSON in MySQL and MariaDB Databases
JSON in MySQL and MariaDB DatabasesFederico Razzoli
 
How MySQL can boost (or kill) your application v2
How MySQL can boost (or kill) your application v2How MySQL can boost (or kill) your application v2
How MySQL can boost (or kill) your application v2Federico Razzoli
 
MySQL Transaction Isolation Levels (lightning talk)
MySQL Transaction Isolation Levels (lightning talk)MySQL Transaction Isolation Levels (lightning talk)
MySQL Transaction Isolation Levels (lightning talk)Federico Razzoli
 
Cassandra sharding and consistency (lightning talk)
Cassandra sharding and consistency (lightning talk)Cassandra sharding and consistency (lightning talk)
Cassandra sharding and consistency (lightning talk)Federico Razzoli
 
MySQL Query Optimisation 101
MySQL Query Optimisation 101MySQL Query Optimisation 101
MySQL Query Optimisation 101Federico Razzoli
 
How MySQL can boost (or kill) your application
How MySQL can boost (or kill) your applicationHow MySQL can boost (or kill) your application
How MySQL can boost (or kill) your applicationFederico Razzoli
 

More from Federico Razzoli (20)

Webinar - Unleash AI power with MySQL and MindsDB
Webinar - Unleash AI power with MySQL and MindsDBWebinar - Unleash AI power with MySQL and MindsDB
Webinar - Unleash AI power with MySQL and MindsDB
 
MariaDB Security Best Practices
MariaDB Security Best PracticesMariaDB Security Best Practices
MariaDB Security Best Practices
 
A first look at MariaDB 11.x features and ideas on how to use them
A first look at MariaDB 11.x features and ideas on how to use themA first look at MariaDB 11.x features and ideas on how to use them
A first look at MariaDB 11.x features and ideas on how to use them
 
MariaDB stored procedures and why they should be improved
MariaDB stored procedures and why they should be improvedMariaDB stored procedures and why they should be improved
MariaDB stored procedures and why they should be improved
 
Webinar - MariaDB Temporal Tables: a demonstration
Webinar - MariaDB Temporal Tables: a demonstrationWebinar - MariaDB Temporal Tables: a demonstration
Webinar - MariaDB Temporal Tables: a demonstration
 
Webinar - Key Reasons to Upgrade to MySQL 8.0 or MariaDB 10.11
Webinar - Key Reasons to Upgrade to MySQL 8.0 or MariaDB 10.11Webinar - Key Reasons to Upgrade to MySQL 8.0 or MariaDB 10.11
Webinar - Key Reasons to Upgrade to MySQL 8.0 or MariaDB 10.11
 
MariaDB 10.11 key features overview for DBAs
MariaDB 10.11 key features overview for DBAsMariaDB 10.11 key features overview for DBAs
MariaDB 10.11 key features overview for DBAs
 
Recent MariaDB features to learn for a happy life
Recent MariaDB features to learn for a happy lifeRecent MariaDB features to learn for a happy life
Recent MariaDB features to learn for a happy life
 
Advanced MariaDB features that developers love.pdf
Advanced MariaDB features that developers love.pdfAdvanced MariaDB features that developers love.pdf
Advanced MariaDB features that developers love.pdf
 
Automate MariaDB Galera clusters deployments with Ansible
Automate MariaDB Galera clusters deployments with AnsibleAutomate MariaDB Galera clusters deployments with Ansible
Automate MariaDB Galera clusters deployments with Ansible
 
Creating Vagrant development machines with MariaDB
Creating Vagrant development machines with MariaDBCreating Vagrant development machines with MariaDB
Creating Vagrant development machines with MariaDB
 
MariaDB, MySQL and Ansible: automating database infrastructures
MariaDB, MySQL and Ansible: automating database infrastructuresMariaDB, MySQL and Ansible: automating database infrastructures
MariaDB, MySQL and Ansible: automating database infrastructures
 
Database Design most common pitfalls
Database Design most common pitfallsDatabase Design most common pitfalls
Database Design most common pitfalls
 
JSON in MySQL and MariaDB Databases
JSON in MySQL and MariaDB DatabasesJSON in MySQL and MariaDB Databases
JSON in MySQL and MariaDB Databases
 
How MySQL can boost (or kill) your application v2
How MySQL can boost (or kill) your application v2How MySQL can boost (or kill) your application v2
How MySQL can boost (or kill) your application v2
 
MySQL Transaction Isolation Levels (lightning talk)
MySQL Transaction Isolation Levels (lightning talk)MySQL Transaction Isolation Levels (lightning talk)
MySQL Transaction Isolation Levels (lightning talk)
 
Cassandra sharding and consistency (lightning talk)
Cassandra sharding and consistency (lightning talk)Cassandra sharding and consistency (lightning talk)
Cassandra sharding and consistency (lightning talk)
 
MariaDB Temporal Tables
MariaDB Temporal TablesMariaDB Temporal Tables
MariaDB Temporal Tables
 
MySQL Query Optimisation 101
MySQL Query Optimisation 101MySQL Query Optimisation 101
MySQL Query Optimisation 101
 
How MySQL can boost (or kill) your application
How MySQL can boost (or kill) your applicationHow MySQL can boost (or kill) your application
How MySQL can boost (or kill) your application
 

Recently uploaded

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 

Recently uploaded (20)

Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 

Playing with the CONNECT storage engine

  • 2. $ whoami Hi, I’m Federico Razzoli from Vettabase Ltd Database consultant, open source supporter, long time MariaDB and MySQL user ● vettabase.com ● Federico-Razzoli.com
  • 4. What is a Storage Engine? ● MariaDB knows nothing about… ○ Writing / reading data ○ Writing / reading indexes ○ Caching data and indexes ○ Transactions ○ … ● These functionalities are implemented in special plugins called storage engines ● InnoDB is the default storage engine
  • 5. Some storage engines do strange things... ● BLACKHOLE ● SEQUENCE ● SPIDER ● CSV
  • 6. What is a Storage Engine? The list can vary depending on MariaDB version and distribution MariaDB [(none)]> SELECT * FROM information_schema.ENGINES WHERE ENGINE = 'CONNECT' G *************************** 1. row *************************** ENGINE: CONNECT SUPPORT: YES COMMENT: Management of External Data (SQL/NOSQL/MED), including Rest query results TRANSACTIONS: NO XA: NO SAVEPOINTS: NO 1 row in set (0.000 sec)
  • 7. CONNECT ● CONNECT is designed for MED (Management of External Data) ● It connects MariaDB to data stored in another form ● It depending on the TABLE_TYPE it can do many things: ○ Use data from remote DBMSs ○ Use data from files in various formats ○ Special data sources ○ Data tranformation
  • 9. Inward / Outward CREATE TABLE file_table ( a INT , b INT ) ENGINE = CONNECT , TABLE_TYPE = CSV , FILE_NAME = 'data.csv' ; ● A file-based table can be inward or Onward ● If FILE_NAME is specified the table is Outward ● Outward tables are assumed to be “holy”
  • 10. ALTER TABLE on File-Based tables ALTER TABLE file_table DROP COLUMN a; ● If the table is Outward: ○ A column disappears from the table; ○ But the underlying file remains unchanged. ● If the table is Inward, the underlying file is modified
  • 11. Inward Tables CREATE TABLE csv_data ( ... ) ENGINE = CONNECT, TABLE_TYPE = CSV; ● The CSV file will be located in the database directory ● In this case, the file name will be csv_data.csv ● To know the exact name: ○ SHOW WARNINGS; ○ Regexp to get the filename: s(w+)$
  • 12. Import + Modify + Export ● An interesting use case for CONNECT is: ○ Receive data in a certain understood format ○ Make some changes that are easier in SQL ■ SELECT column_list FROM table ■ SELECT a, AVG(b) FROM table GROUP BY a ○ Export the data in the same format
  • 13. Import + Modify + Export ● This can be done: ○ Create an Outward table ○ CREATE TABLE exported_data SELECT … ○ Copy the table elsewhere and DROP it ● Or, for more complex transformations: ○ Create an Outward table ○ CREATE TABLE intermediate_data … ENGINE InnoDB; ○ Add indexes as needed ○ Make some data transformation ○ CREATE TABLE exported_data ENGINE=CONNECT, TABLE_TYPE=CSV, SEP_CHAR='t', HEADER=1 SELECT * FROM intermediate_data ○ Copy the table elsewhere and DROP it
  • 14. Exporting data ALTER TABLE numbers ENGINE = CONNECT , TABLE_TYPE = CSV , SEP_CHAR = 't' ; ● This is the most efficient way to transform a table that you don’t need anymore ● But the file will be created in MariaDB datadir, you cannot specify a different path for Inward tables
  • 15. Let’s try reading Apache logs
  • 16. Sample ● A small sample of vettabase.com Apache error log ● IPs are scrambled
  • 17. 40.88.21.225 - - [07/Sep/2020:17:11:22 +0100] "GET / HTTP/1.1" 302 - "http://vettabase.com/" "Mozilla/5.0 (compatible; DuckDuckGo-Favicons-Bot/1.0; +http://duckduckgo.com)" 198.100.126.179 - - [07/Sep/2020:17:14:34 +0100] "GET /admin/ HTTP/1.1" 404 - "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:62.0) Gecko/20100101 Firefox/62.0" 120.26.50.46 - - [07/Sep/2020:18:33:24 +0100] "HEAD /caiyuan/login.php HTTP/1.1" 404 - "-" "-" 120.27.51.66 - - [07/Sep/2020:18:33:24 +0100] "HEAD /guanli/login.php HTTP/1.1" 404 - "-" "-" 120.27.51.66 - - [07/Sep/2020:18:33:24 +0100] "HEAD /admin/login.php HTTP/1.1" 404 - "-" "-"
  • 18. Mmmm... ● A precise machine-readable format is used ● But it’s a bit irregular - a bit less machine readable than CSV or JSON ● We’ll have to define a way to parse the data we need ● We only care about some columns
  • 19. 40.88.21.225 - - [07/Sep/2020:17:11:22 +0100] "GET / HTTP/1.1" 302 - "http://vettabase.com/" "Mozilla/5.0 (compatible; DuckDuckGo-Favicons-Bot/1.0; +http://duckduckgo.com)" ● ip: 40.88.21.225 ● time: 07/Sep/2020:17:11:22 ● timezone: +0100 ● request_type: GET ● path: / ● protocol: HTTP/1.1 ● http_response_code: 302
  • 20. CREATE OR REPLACE TABLE web_log ( ip VARCHAR(15) NOT NULL FIELD_FORMAT = '%n%s%n - - ' , time VARCHAR(100) NOT NULL FIELD_FORMAT = '%n%s%n' , timezone VARCHAR(50) NOT NULL FIELD_FORMAT = ' %n%s%n "' , request_type VARCHAR(5) NOT NULL FIELD_FORMAT = '%n%s%n' , path VARCHAR(200) NOT NULL FIELD_FORMAT = ' %n%s%n' , protocol VARCHAR(10) NOT NULL FIELD_FORMAT = ' %n%s%n ' , http_response_code SMALLINT UNSIGNED NOT NULL FIELD_FORMAT = '%n%d%n' ) ENGINE = CONNECT , TABLE_TYPE = 'FMT' , FILE_NAME= '/var/shared/apache.log' ;
  • 21. -- If you forget to specify the full path Warning (Code 1105): Open(rb) error 2 on /usr/local/mariadb/data/./test/apache.log: No such file or directory -- If a FIELD_FORMAT is not correct or the file has lines in an -- inconsistent format ERROR 1296 (HY000): Got error 122 'Bad format line 1 field 3 of web_log' from CONNECT
  • 22. MariaDB [test]> SELECT * FROM web_log LIMIT 1 G *************************** 1. row *************************** ip: 114.119.159.128 time: [07/Sep/2020:13:17:13 timezone: +0100] request_type: GET path: /robots.txt protocol: HTTP/1.1" http_response_code: 404
  • 23. CREATE OR REPLACE TABLE web_log ( ... , time VARCHAR(100) GENERATED ALWAYS AS (SUBSTRING(raw_time FROM 2)) , timezone VARCHAR(5) GENERATED ALWAYS AS (SUBSTRING(raw_timezone FROM 1 FOR CHAR_LENGTH(raw_timezone) - 1)) , protocol VARCHAR(10) GENERATED ALWAYS AS (SUBSTRING(raw_protocol FROM 1 FOR CHAR_LENGTH(raw_protocol) - 1)) ) ... ;
  • 24. MariaDB [test]> SELECT ip, time, timezone, protocol FROM web_log LIMIT 1 G *************************** 1. row *************************** ip: 114.119.159.128 time: 07/Sep/2020:13:17:13 timezone: +0100 protocol: HTTP/1.1 1 row in set (0.002 sec)
  • 25. Let’s do some analyses
  • 26. Analyses on a log MariaDB [test]> SELECT http_response_code, COUNT(*) FROM web_log GROUP BY http_response_code; +--------------------+----------+ | http_response_code | COUNT(*) | +--------------------+----------+ | 302 | 11 | | 404 | 61 | +--------------------+----------+ MariaDB [test]> SELECT request_type, COUNT(*) FROM web_log GROUP BY http_response_code; +--------------+----------+ | request_type | COUNT(*) | +--------------+----------+ | GET | 11 | | GET | 61 | +--------------+----------+
  • 27. PIVOTing a table ● Doing some analyses on logs is cool ● But we’d like to pivot a table, and MariaDB doesn’t support the PIVOT syntax ● But we can use CONNECT’s PIVOT table type
  • 28. CONNECT user ● CONNECT tables that transform data from other tables need to establish a connection to MariaDB and run queries ● In order to do that, they need to use an account ● It is a good practice (and default) to have a mysql@localhost account
  • 29. Creating a PIVOT table ● Note that the table definition contains CONNECT’s user ● SHOW CREATE TABLE shows this info ● This is why it is best to use unix_socket authorisation plugin CREATE OR REPLACE TABLE requests_by_response_and_type ENGINE = CONNECT, TABLE_TYPE = PIVOT, TABNAME = 'web_log', OPTION_LIST = 'user=mysql,host=localhost, PivotCol=request_type,Function=count' ;
  • 30. Reading our PIVOT table MariaDB [test]> SELECT * FROM requests_by_response_and_type ; +-----------------+-----------------------+--------------+------------------ ---------------------------------------------------+--------------+-----+--- ---+ | ip | raw_time | raw_timezone | path | raw_protocol | GET | HEAD | +-----------------+-----------------------+--------------+------------------ ---------------------------------------------------+--------------+-----+--- ---+ | 112.124.0.114 | [07/Sep/2020:16:44:25 | +0100] | /dede/login.php | HTTP/1.1" | 0 | 1 | | 112.124.0.114 | [07/Sep/2020:16:44:25 | +0100] | /dedea/login.php | HTTP/1.1" | 0 | 1 | ...
  • 31. Reading our PIVOT table MariaDB [test]> SELECT request_type, `GET`, `HEAD` FROM requests_by_response_and_type ; ERROR 1054 (42S22): Unknown column 'request_type' in 'field list'
  • 32. PIVOTing a query OPTION_LIST = 'user=mysql,host=localhost', SrcDef = 'SELECT request_type, COUNT(*) FROM web_log GROUP BY request_type' MariaDB [test]> SELECT * FROM requests_by_response_and_type ; +-----+------+ | GET | HEAD | +-----+------+ | 23 | 49 | +-----+------+
  • 33. Other transformations? ● OCCUR unpivots columns ● XCOL turns lists into multiple rows ● TBL allows to treat a set of tables as a single table
  • 34. What did we leave out? ● Almost everything! We’ve just played a bit with an Apache log! ● Other file formats (JSON, XML, HTML tables, ini, fixed-length, …) ● Compressed files ● More magic with custom formats ● Connections via MySQL format, ODBC, JDBC, MongoDB ● Querying remote REST API’s ● ...and more Final hints: build proper indexes where possible increase connect_work_size if your files are big