This document discusses various ways that MySQL could provide more detailed instrumentation and troubleshooting information. It begins by noting that while modern versions have online documentation, errors are often discovered from log files after context is lost. It then provides examples of specific bugs filed requesting additional metrics around areas like connections, replication, tracing, and temporary tables to help troubleshoot issues. The document argues that more visibility into internal operations could help identify causes of problems.
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
What you wanted to know about MySQL, but could not find using inernal instrumentation only
1. What you wanted to know about MySQL
but could not find using internal instrumentation only
February, 3, 2017
Sveta Smirnova
2. ∙ MySQL Support engineer
∙ Author of
∙ MySQL Troubleshooting
∙ JSON UDF functions
∙ FILTER clause for MySQL
∙ Speaker
∙ Percona Live, OOW, Fosdem,
DevConf, HighLoad...
Sveta Smirnova
2
4. ∙ In modern versions we have a lot of online
information
Historical Data
4
5. ∙ In modern versions we have a lot of online
information
∙
However users usually notice error from log
files, when context is already gone
Historical Data
4
6. ∙ In modern versions we have a lot of online
information
∙
However users usually notice error from log
files, when context is already gone
∙ Partially this is solved by modern monitoring
tools (PMM) which can save historical
statistics
Historical Data
4
7. ∙ In modern versions we have a lot of online
information
∙
However users usually notice error from log
files, when context is already gone
∙ Partially this is solved by modern monitoring
tools (PMM) which can save historical
statistics
∙
But not about everything
Historical Data
4
8. ∙ It is easy to find in the Audit log records query
which failed with this error
<AUDIT_RECORD
NAME="Query"
RECORD="2_2017-01-12T20:40:36"
TIMESTAMP="2017-01-12T20:41:32 UTC"
COMMAND_CLASS="update"
CONNECTION_ID="3"
STATUS=" 1205"
SQLTEXT="update t1 set f1=f1-1"
USER="root[root] @ localhost [127.0.0.1]"
HOST="localhost"
OS_USER=
IP="127.0.0.1"
DB="test"
/>
Lock wait timeout
5
9. ∙ It is easy to find in the Audit log records query
which failed with this error
∙
But there is the query which holds the lock?
Lock wait timeout
5
10. ∙ It is easy to find in the Audit log records query
which failed with this error
∙
But there is the query which holds the lock?
∙
Even hard to find online
∙ Especially if you have thousands of running
threads!
Lock wait timeout
5
11. ∙ It is easy to find in the Audit log records query
which failed with this error
∙
But there is the query which holds the lock?
∙
Even hard to find online
∙ Multiple statement transactions make it worse
Lock wait timeout
5
12. ∙ It is easy to find in the Audit log records query
which failed with this error
∙
But there is the query which holds the lock?
∙
Even hard to find online
∙ Multiple statement transactions make it worse
∙ However server has all information to print all
queries of locking transaction
Lock wait timeout
5
13. ∙ It is easy to find in the Audit log records query
which failed with this error
∙
But there is the query which holds the lock?
∙
Even hard to find online
∙ Multiple statement transactions make it worse
∙ However server has all information to print all
queries of locking transaction
∙ MySQL Bug #84563
Lock wait timeout
5
14. ∙
First transaction
––––––––––––
LATEST DETECTED DEADLOCK
––––––––––––
2017-01-19 13:03:42 7f37fc636700
*** (1) TRANSACTION:
TRANSACTION 1298, ACTIVE 3 sec starting index read
...
DELETE FROM t WHERE i = 1
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 314 n bits 72 index ‘GEN_CLUST_INDEX‘
of table ‘test‘.‘t‘ trx id 1298 lock_mode X waiting
...
What exactly caused the deadlock?
6
15. ∙
First transaction
∙
Second transaction
*** (2) TRANSACTION:
TRANSACTION 1297, ACTIVE 7 sec starting index read
...
DELETE FROM t WHERE i = 1
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 0 page no 314 n bits 72 index ‘GEN_CLUST_INDEX‘
...
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 0 page no 314 n bits 72 index ‘GEN_CLUST_INDEX‘
...
*** WE ROLL BACK TRANSACTION (1)
What exactly caused the deadlock?
6
19. ∙
First transaction
∙
Second transaction
∙
Which query held the lock?
∙ SELECT * FROM t WHERE i = 1 LOCK IN SHARE MODE;
∙ How would we know?
∙ refman/.../innodb-deadlock-example.html
What exactly caused the deadlock?
6
20. ∙
First transaction
∙
Second transaction
∙
Which query held the lock?
∙ SELECT * FROM t WHERE i = 1 LOCK IN SHARE MODE;
∙ How would we know?
∙ refman/.../innodb-deadlock-example.html
∙ Bug #84607
What exactly caused the deadlock?
6
21. ∙ Performance Schema
∙ Bug #71364 Please provide warning text
information into P_S
∙ Bug #61030 Make an I_S table of client error
codes
∙ Bug #58058 please add instrumentation to track
error counts on a server
Some past requests
7
22. ∙ Performance Schema
∙ General logging
∙ Bug #70796 Error messages and warnings for
sql-mode behaviours need more verbosity
∙ Bug #64190 Log failed queries in a separate log
∙ Bug #60884 Enable logging of all errors to the
error log
∙
Bug #34137 Additional logging of the server
shutdown process
Some past requests
7
23. ∙ Which kind of query can produce this output?
∙ t is InnoDB table
mysql> select * from table_handles where object_name=’t’G
*************************** 1. row ***************************
OBJECT_TYPE: TABLE
OBJECT_SCHEMA: test
OBJECT_NAME: t
OBJECT_INSTANCE_BEGIN: 140108477034256
OWNER_THREAD_ID: 23
OWNER_EVENT_ID: 3788
INTERNAL_LOCK: NULL
EXTERNAL_LOCK: READ EXTERNAL
1 row in set (0,00 sec)
Table_handles
8
24. ∙ Which kind of query can produce this output?
∙ lock table t read;
Table_handles
8
25. ∙ Which kind of query can produce this output?
∙ lock table t read;
∙ select * from t [lock in share mode];
Table_handles
8
26. ∙ Which kind of query can produce this output?
∙ lock table t read;
∙ select * from t [lock in share mode];
∙
select * from t where i [=,in,<,>] ...
Table_handles
8
27. ∙ Which kind of query can produce this output?
∙ lock table t read;
∙ select * from t [lock in share mode];
∙
select * from t where i [=,in,<,>] ...
∙ But not select * from t where unique_key = ... !
Table_handles
8
28. ∙ Which kind of query can produce this output?
∙ t is InnoDB table
mysql> select * from table_handles where object_name=’t’G
*************************** 1. row ***************************
OBJECT_TYPE: TABLE
OBJECT_SCHEMA: test
OBJECT_NAME: t
OBJECT_INSTANCE_BEGIN: 140108477034256
OWNER_THREAD_ID: 23
OWNER_EVENT_ID: 4379
INTERNAL_LOCK: NULL
EXTERNAL_LOCK: WRITE EXTERNAL
1 row in set (0,00 sec)
Table_handles
8
29. ∙ Which kind of query can produce this output?
∙ lock table t write;
∙ Manual says: "The table lock used at the storage
engine level. The value is one of READ
EXTERNAL or WRITE EXTERNAL."
∙
Is this storage engine level operation?
Table_handles
8
30. ∙ Which kind of query can produce this output?
∙ lock table t write;
∙ select * from t for update;
Table_handles
8
31. ∙ Which kind of query can produce this output?
∙ lock table t write;
∙ select * from t for update;
∙
update t set i=i+sleep(i) where i [=,in,<,>] ...
Table_handles
8
32. ∙ Which kind of query can produce this output?
∙ Bug #84609
Table_handles
8
33. ∙ Which kind of query can produce this output?
∙ Bug #84609
∙ Bug #84610
Table_handles
8
34. ∙ In past we had only one troubleshooting tool
∙ SHOW SLAVE STATUS
Replication
9
35. ∙ In past we had only one troubleshooting tool
∙ Today Performance Schema supports
replication
Replication
9
36. ∙ In past we had only one troubleshooting tool
∙ Today Performance Schema supports
replication
∙
But it still misses
∙ Bug #81249 SLAVE_NET_TIMEOUT TO
P_S FOR SLAVE THREAD VARIABLES
∙ Bug #78918 Metric for succesful slave reconnects
∙ Bug #77605 Add more information to SQL
thread-related P_S tables
Replication
9
37. ∙ In past we had only one troubleshooting tool
∙ Today Performance Schema supports
replication
∙
But it still misses
∙ Bug #76828 Slave details on a master
∙ Bug #74809 Stats per binlog event type
∙ Bug #72826 Support for joining
replication_execute_status_by_%
∙ Bug #70951 Threads shutdown info
Replication
9
38. ∙ What does this output mean?
2017-01-20T21:44:52.301177Z 5 [Note] Aborted connection 5 to db: ’test’ user: ’root’
host: ’localhost’ (Got timeout reading communication packets)
Connection errors
10
39. ∙ What does this output mean?
∙ Timeout while connection was establishing?
Connection errors
10
40. ∙ What does this output mean?
∙ Timeout while connection was establishing?
∙ Connection was aborted, because
interactive_timeout/wait_timeout passed?
Connection errors
10
41. ∙ What does this output mean?
∙ Timeout while connection was establishing?
∙ Connection was aborted, because
interactive_timeout/wait_timeout passed?
∙ Something else?
Connection errors
10
42. ∙ What does this output mean?
∙ Timeout while connection was establishing?
∙ Connection was aborted, because
interactive_timeout/wait_timeout passed?
∙ Something else?
∙ Bug #51219, Bug #28836, Bug #78843, Bug
#84612
Connection errors
10
43. ∙ Bug #77888 max_used_connection per
user/account missing in P_S/sys
∙
Bug #77581 Collect DNS timing information
into Performance_Schema
∙ Bug #76403
COUNT_ABORTED_CLIENT_ERRORS to
P_S.host_cache
∙
Bug #72219 First and last connection
timestamps to P_S.users table
Other connection requests
11
44. ∙ Bug #71305
PERFORMANCE_SCHEMA.THREADS
table, add a PORT column
∙
Bug #71186 P_S.host_cache does not collect
connections aborted entries
∙ Bug #69880 Track and expose connection
creation timestamp
Other connection requests
11
45. ∙ Bug #69725 P_S.socket_instances doesn’t
include named pipe or shared memory
connections
∙
Bug #45817 Please add SHOW command for
inc_host_errors(max_connect_errors)
∙ Bug #21565 More verbose connection log
Other connection requests
11
46. ∙ One more output
mysql> flush status;
Query OK, 0 rows affected (0,00 sec)
mysql> select ...
600048 rows in set (1 min 17,26 sec)
mysql> show status like ’Created_tmp%’;
+-------------------------+-------+
| Variable_name | Value |
+-------------------------+-------+
| Created_tmp_disk_tables | 2 |
| Created_tmp_files | 6 |
| Created_tmp_tables | 3 |
+-------------------------+-------+
3 rows in set (0,00 sec)
Temporary tables
12
47. ∙ One more output
∙ Were tables created in simultaneously?
Temporary tables
12
48. ∙ One more output
∙ Were tables created in simultaneously?
∙ What is their size?
Temporary tables
12
49. ∙ One more output
∙ Were tables created in simultaneously?
∙ What is their size?
∙
Solution: watch lsof
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
mysqld 8697 sveta 70u REG 0,43 11765657 43001204 /tmp/mysqld.1/MYSeEOHe (deleted)
mysqld 8697 sveta 71u REG 0,43 11765657 43001205 /tmp/mysqld.1/MYVwF8Od (deleted)
Temporary tables
12
50. ∙ One more output
∙ Were tables created in simultaneously?
∙ What is their size?
∙
Solution: watch lsof
∙
Bug #74484
Temporary tables
12
51. ∙ One more output
∙ Were tables created in simultaneously?
∙ What is their size?
∙
Solution: watch lsof
∙
Bug #74484
∙
Bug #84613
Temporary tables
12
53. ∙ I_S.OPTIMIZER_TRACE is good addition for
Optimizer
∙
But what about other parts of the server?
Trace
13
54. ∙ I_S.OPTIMIZER_TRACE is good addition for
Optimizer
∙
But what about other parts of the server?
∙ Runtime
Trace
13
55. ∙ I_S.OPTIMIZER_TRACE is good addition for
Optimizer
∙
But what about other parts of the server?
∙ Runtime
∙
Parser
Trace
13
56. ∙ I_S.OPTIMIZER_TRACE is good addition for
Optimizer
∙
But what about other parts of the server?
∙ Runtime
∙
Parser
∙ Binary logging
Trace
13
57. ∙ I_S.OPTIMIZER_TRACE is good addition for
Optimizer
∙
But what about other parts of the server?
∙ Runtime
∙
Parser
∙ Binary logging
∙ InnoDB
Trace
13
58. ∙ I_S.OPTIMIZER_TRACE is good addition for
Optimizer
∙
But what about other parts of the server?
∙ Runtime
∙
Parser
∙ Binary logging
∙ InnoDB
∙
Bug #84620
Trace
13
59. ∙ General
∙ Bug #83626 Collect per column usage data in
performance_schema
∙ Bug #71755 Provide per partition summary
information in PERFORMANCE_SCHEMA
∙ Bug #81020 performance_schema: Please add
optimizer usage statistics
∙ Bug #55171 How much sort_buffer_size are
actually used?
More tracing requests
14
60. ∙ General
∙ InnoDB
∙ Bug #81611 Add P_S metrics to collect
compressed page bytes vs other types written to
relog
∙ Bug #78448 Provide better metrics on
innodb_sort_buffer_size usage
∙
Bug #71698 Add instrumentation for the
doublewrite buffer and undo segments
More tracing requests
14
62. ∙ SHOW PROCESSLIST has multiple states
∙ Some of them are clear
Vague stages
15
63. ∙ SHOW PROCESSLIST has multiple states
∙ Some of them are clear
∙ But what do these mean?
∙ System lock
∙
statistics
∙
freeing items
∙ Sending data
∙ cleaning up
∙
closing tables
∙
end
Vague stages
15
64. ∙ SHOW PROCESSLIST has multiple states
∙ Some of them are clear
∙ But what do these mean?
∙
Bug #57544
Vague stages
15
65. ∙ SHOW PROCESSLIST has multiple states
∙ Some of them are clear
∙ But what do these mean?
∙
Bug #57544
∙
Bug#72083
Vague stages
15
66. ∙ SHOW PROCESSLIST has multiple states
∙ Some of them are clear
∙ But what do these mean?
∙
Bug #57544
∙
Bug#72083
∙
Bug #84615
Vague stages
15