HighLoad++ 2017
Зал «Кейптаун», 8 ноября, 16:00
Тезисы:
http://www.highload.ru/2017/abstracts/3115.html
During this session we will cover the last development in ProxySQL to support regular expressions (RE2 and PCRE) and how we can use this strong technique in correlation with ProxySQL's query rules to anonymize live data quickly and transparently. We will explain the mechanism and how to generate these rules quickly. We show live demo with all challenges we got from the Community and we finish the session by an interactive brainstorm testing queries from the audience.
2. Who we are
René Cannaò
Founder of ProxySQL
MySQL SRE at Dropbox
thanks to:
Frédéric Descamps
MySQL Community Manager
3. Other Sessions
273. ProxySQL, MaxScale, MySQL Router and other database traffic
managers / Petr Zaitsev (Percona)
155. ProxySQL Use Case Scenario / Alkin Tezuysal (Percona)
4. Agenda
● Database overview
● What is ProxySQL
● Features overview
● Data masking
● Rules
● Masking rules
● Obfuscation with mysqldump
● Examples
7. Main motivations
empower the DBAs
Improves manageability
understand and improve performance
High performance and High Availability
create a proxy layer to shield the database
8. Database as a Service (layered)
APPLICATIONS
DATABASES + MANAGER(s)
DAAS – REVERSE PROXY
12. ProxySQL Features (short list)
High Availability and Scalability
seamless failover
firewall
query throttling
query timeout
query mirroring
runtime reconfiguration
Scheduler
Support for Galera/PXC and
Group Replication
on-the-fly rewrite of queries
caching reads outside the database
connection pooling and multiplexing
complex query routing and r/w split
load balancing
real time statistics
monitoring
Data masking
Multiple instances on same ports
Native Clustering
14. Data Masking
Data masking or data obfuscation is the process of hiding original
data with random characters or data.
The main reason for applying masking to a data field is to protect
data that is classified as personal identifiable data, personal
sensitive data or commercially sensitive data, however the data
must remain usable for the purposes of undertaking valid test cycles
15. Why using ProxySQL as data masking
solution?
Open Source & Free like in beer
Other solutions are expensive or not working
Not worse than the other solutions as currently none is perfect
The best solution would be to have this feature implemented in the
server just after the handler API
17. Query Rewrite
Dynamically rewrite queries sent by the application/client
without the client being aware
on the fly
using ProxySQL query rules
rules defined using regular expressions, s/match/replace/
18. The concept
We use Regular Expressions to modify the clients’ SQL statement
and replace the column(s) we want to hide by some characters or
generate fake data.
We will split our solution in two different solutions:
● Provide access to the database to developers
● Generate dump to populate a database to share
Only the defined users, in our example we use a developer, will
have his statements modified.
19. The concept (2)
We will also create two categories :
•data masking
•data obfuscating
20. Data Masking
Here we will just mask with a generic character the full value of the
column or part of it:
21. Data Obfuscation
Here we will just replace the value of the column with random
characters of the same type, we create fake data
22. Access
INSERT INTO mysql_users
(username, password, active, default_hostgroup)
VALUES ('devel','devel',1,1);
INSERT INTO mysql_users
(username, password, active, default_hostgroup)
VALUES ('backup','dumpme',1,1);
Create a user for masking:
Create a user for backups:
23. Rules
Avoid SELECT *
for the developer, we need to create some rules to block any
SELECT * variant on the table
if the column is part of many tables, we need to do so for each
of them
24. Rules (2)
Mask or obfuscate the field
when the field is selected in the columns we need:
● to replace the column by showing the first 2 characters and a
certain amount of X s or generate a random string
● keep the column name
● for mysqldump we need to allow SELECT * but mask and/or
obfuscate sensible values
27. rule_id: 158
active: 1
username: devel
schemaname: employees
flagIN: 0
match_pattern: ((?)(`?w+`?.)?salary()?)([ ,n])
negate_match_pattern: 0
re_modifiers: CASELESS,GLOBAL
flagOUT: NULL
replace_pattern: 1CONCAT( floor(rand() * 50000) + 10000,'')3
salary4
Rule #2 - obfuscating
Let's imagine we want to provide fake number for `salaries`.`salary` column.
We could instead of the previous rule use this one
30. rule_id: 5
active: 1
username: devel
schemaname: employees
match_pattern: ^SELECTs+*.*FROM.*employees
re_modifiers: caseless,global
error_msg: Query not allowed due to sensitive
information, please contact dba@acme.com
apply: 0
Rule #5
31. rule_id: 6
active: 1
username: devel
schemaname: employees
match_pattern: ^SELECTs+employees.*.*FROM.*employees
re_modifiers: caseless,global
error_msg: Query not allowed due to sensitive
information, please contact dba@acme.com
apply: 0
Rule #6
32. rule_id: 7
active: 1
username: devel
schemaname: employees
match_pattern: ^SELECTs+(w+).*.*FROM.*employeess+(ass+)?(1)
re_modifiers: caseless,global
error_msg: Query not allowed due to sensitive
information, please contact dba@acme.com
apply: 0
Rule #6
33. Rules for mysqldump
To provide a dump that might be used by developers, Q/A or
support, we need to:
● generate valid data
● obfuscate sensitive information
● rewrite SQL statements issued by mysqldump
● only for tables and columns with sensitive data
36. Limitions
● better support in proxySQL >= 1.4.x
○ RE2 an PCRE regexes
● all fields with the same name will be masked whatever the
name of the table is in the same schema
● the regexps can always be not sufficient
● block any query not matching whitelisted SQL statements
● the dump via ProxySQL solution seems to be the best
37. Make it easy
This is not really easy isn´t it ?
You can use this small bash script
(https://github.com/lefred/maskit) to generate them:
# ./maskit.sh -c first_name -t employees -d employees
column: first_name
table: employees
schema: employees
let's add the rules...
39. Examples (2)
More difficult:
select emp_no, concat(first_name), last_name from
employees;
select emp_no, first_name, first_name from
employees.employees
select emp_no, `first_name` from employees;
select emp_no, first_name
-> from employees; (*)
40. Examples (3)
More difficult:
select t1.first_name from employees.employees as t1;
select emp_no, first_name as fred from employees;
select emp_no, first_name rene from employees;
select emp_no, first_name `as` from employees;
select first_name as `as`, last_name from employees;
select `t1`.`first_name` from employees.employees as t1;
41. Examples (4)
More difficult:
select first_name fred, last_name from employees;
select emp_no, first_name /* first_name */ from
employees.employees;
/* */ select last_name, first_name from employees;
select CUSTOMERS.* from myapp.CUSTOMERS;
select a.* from employees.employees a;`