Digital Identity is Under Attack: FIDO Paris Seminar.pptx
In-memory Database and MySQL Cluster
1. In-memory Database
& MySQL Cluster
Grandis He (grandis.he@gmail.com)
http://www.linkedin.com/in/grandis
Any question? Any comment? Just let me know
Xiongwei He (Grandis)
2010-10-29 1
grandis.he@gmail.com
2. Personal Introduction
• Before immigration to Australia
– Lead Zero Downtime Upgrade Feature for Alcatel
Lucent Subscriber Data Management (MySQL
Application Year 2009 Award)
– Lead Super Distributed Home Location Register
(SDHLR) developer team which used Oracle
TimesTen for 7 years
Xiongwei He (Grandis)
2010-10-29 2
grandis.he@gmail.com
3. In-memory Database Development
• Before Y2000
– Vendor DIY
• NO SQL Support and limited Search Option
• Not easy for management
– Alternative choice: Berkley DB (Key-Value pair)
• After Y2000 – Independent Vendors
– SQL/ODBC/JDBC Support
– Easy for management
• Now – Major database vendors (Except Microsoft) have in
memory options by purchasing or self-development
• Market Value for In-memory Database: SAP acquired Sybase –
One major reason mentioned in SAP PR is: Sybase In Memory
Database Xiongwei He (Grandis)
2010-10-29 3
grandis.he@gmail.com
4. Different ways to be in-memory
• In-memory only database (or called as diskless)
– Data in Memory only
• In-memory cache to database
– Data will be synced to database which sync to disk
• In-memory database
– Data will be written to disk
Note: Only for products with SQL Support
2010-10-29 Xiongwei He (Grandis) 4
grandis.he@gmail.com
5. In-memory Only Database (Diskless)
• Mainstream products • Java Open Source DB
– Oracle TimesTen – HyperSQL (HSQLDB)
– MySQL Cluster – Apache Derby
– IBM SolidDB – H2
– Sybase ASE IMDB
• Typical usage
– Session management
– automatic generated data store such as GPS location data of
Smart Phone/Base Station location store of mobile phones
Xiongwei He (Grandis)
2010-10-29 5
grandis.he@gmail.com
6. In-memory Cache to Database
• Main products
– IBM SolidDB Universal Cache
• Support DB2, Informix, Oracle, Sybase and Microsoft
– Oracle In Memory Cache (Renamed from TimesTen Cache)
• Only support Oracle
• Advantage: No change to existing applications and
optimized some applications with real time speed
• Cost: Cache License + Database License
• Possible motivation for Oracle to buy TimesTen and
IBM to buy SolidDB
Xiongwei He (Grandis)
2010-10-29 6
grandis.he@gmail.com
7. In Memory Database
• Abbreviation: IMDB
• Another name: Main Memory Database (MMDB)
• Now it is close to disk based database for operational
convenience while holding data in the memory
• Real-time speed to access database (Always use 10
times faster for advertisement)
• Main products
– Oracle TimesTen
– MySQL Cluster
– IBM SolidDB
Xiongwei He (Grandis)
2010-10-29 7
grandis.he@gmail.com
8. In-memory Database Features
• FULL Database are in memory, Query will not trigger
Disk IO
• ACID – Non-duration for fast performance. (Some
databases also provide durable option, some databases
do NOT)
• Low Level API to access DB beside JDBC/ODBC
• Low Latency Super Speed for Database Access
– Speed in microseconds or 2-5 milliseconds
– Among Select/Update/Insert, select is fastest, then update,
then insert (latency might be 10 times for select)
• High Throughput
• High Availability (HA) Support
Xiongwei He (Grandis)
2010-10-29 8
grandis.he@gmail.com
9. HA - Data Safety for 2 Node
(Share nothing)
– 2-Safe Durable (For Disk Based Database): 1 transaction will
wait all the transaction commit to disk of Node1 and Node2
– 2-Safe Visible (For In-Memory Database): 1 transaction will
wait all the transaction commit to system (non-durable for in
memory database) of Node1 and Node2
– 2-Safe Received: For transaction issued on Node1, Node1
commit transaction after it receive the message from Node2
that Node2 already received the replication log
– 1-Safe: Node1 transaction commit does not depend on log
replication to Node2
Xiongwei He (Grandis)
2010-10-29 9
grandis.he@gmail.com
10. HA Term – Sync vs Safe
Sync/Async 2-Safe/1-Safe Other Term
Sync 2-Safe Durable 2 Phase Commit
2-Safe Visible
Semi-Sync 2-Safe Received Return reception
Replication
Async 1-Safe
Xiongwei He (Grandis)
2010-10-29 10
grandis.he@gmail.com
11. HA – Database node redundancy
• Two node redundancy mode
Share Disk (Used by Disk Based Database – Concern is Disk Array Quality)
• Active/Standby (by 3rd Clusterware)
• Active/Active (Supported by Database Vendors – RAC, ASE Cluster , DB2
pureScale)
Share Nothing (Used by in-memory database)
• Database Active/Standby (Less used)
• Database Active/Standby Read-only, also called as Write/Reader
• Database Active/Active
2-Way Replication Way with Conflict Resolution
3rd Server: extreme reliable NTP server for conflict resolution
2-Phase Commit
• three nodes/four nodes redundancy mode: Mix of above
technologies
• Switch Over Behavior for Active/Standby: Standby become Active,
Active become standby [Automatically or Manually]
12. MySQL Cluster Oracle TimesTen IBM SolidDB
Latest version 7.1 11g release 2 6.5
Share Memory No (Distributed) Yes – Direct Driver Yes – Shared Memory
Access Connection Access
Latency 2-5ms (Distributed) Tens of microseconds - Tens of microseconds -
hundreds of microseconds hundreds of microseconds
Throughput Tens of thousands to Tens of thousands to Tens of thousands to
(Database but hundreds of thousands hundreds of thousands hundreds of thousands
NOT application)
Durable Option No Yes Yes
HA Support – 2-Save Visible for NDB Nodes 2-Safe Durable 2-Safe Durable
Replication 1-Safe for MySQL Cluster 2-Safe Visible 2-Safe Visible
2-Safe Received 2-Safe Received
1-Safe 1-Safe
Node Active/Active for NDB Node Active/Active Active/Standby Read-only
Redundancy Active/Active or Active/Standby Read-only
Active/Standby Read-Only
for 2 MySQL Cluster
Transaction Read Commit Serializable Repeatable Read (Primary
Isolation Xiongwei He (Grandis)
Read Commit Node Only for HSB) 12
2010-10-29
grandis.he@gmail.com
Read Commit
13. MySQL Cluster Oracle TimesTen IBM SolidDB
Disk Field Support Yes with index in No Not in field level. But
memory Alternative Solution: Oracle can be whole table
Cache + Oracle Database while while index is NOT in
index is NOT in memory memory
Scalability 256 Nodes at Max, 48 Limited to Machine Limited to Machine
Data Nodes at Max
Diskless option Yes Yes Yes
Change Notification NDB Event Notification XLA (Transaction Log API) Transaction Log Reader
(asynchronized)
Trigger Yes (Only on MySQL No Yes
Nodes, can not called by
NDB API)
Store Procedure Yes (Only on MySQL Yes Yes
Server Nodes, can not
called by NDB API)
Friendly Interface MySQL Server Interface, Oracle Friendly (OCI,PRO*C) SA API (Low Level API)
JDBC, ODBC JDBC and ODBC Light Client
NDB API (MySQL Cluster XA and JTA (DTP support) ODBC, JDBC (JTA)
2010-10-29
Only) TTClass (TimesTen only, C++)
Xiongwei He (Grandis)
13
grandis.he@gmail.com
14. MySQL Cluster
• 2003 Acquired Alzato – Ericsson venture
• Another name: NDB Cluster
• Use different version: MySQL Cluster version is
different from MySQL Server version
• Cost: Cheap license in comparison to
TimesTen
Xiongwei He (Grandis)
2010-10-29 14
grandis.he@gmail.com
15. MySQL Cluster Features
• Low Cost – Use commodity hardware without disk array
• High reliable
– Shared nothing (better than Shared Disk Array and Mirror
for maintenance) in single cluster
– Geo Redundancy Support by Cluster Level Replication
• High performance/frequency (especially with NDB API)
• Distributed for application access
• Low Latency: 2-5 ms
• Disk Field Support: Address the issues for memory
limitation when application need support big field
(Major advantage to other IMDBs)
Xiongwei He (Grandis)
2010-10-29 15
grandis.he@gmail.com
16. MySQL Cluster Architecture
MySQL App MySQL App 1. 2-Phase Commit
between Data Nodes
3 2. Replication between
Application Application
MySQL MySQL MySQL Cluster
NDB Native API NDB Native API Server Server 3. Standard MySQL
Server Interface
Data 1 Data 3 Management
Data N-1
(MGM) Node
1
Data 2 Data 4 Data N
1 3 N-1
MySQL 2 MySQL
2 4 N
Server Server
Xiongwei He (Grandis)
2010-10-29 16
grandis.he@gmail.com
17. MySQL Cluster Nodes
• Management Node (MGM Node): Node for Data
Management (ndb_mgmd)
– Multiple MGM Nodes supported
– Why there is MGM Node: Monitor the system and also log
is helpful for database startup after shutdown
– MGM API: Can be used for develop 3rd monitor software
such as SNMP Agent to notify SNMP manager for Fault
Management (Sending alarm for Node Abnormal Status)
• Data Node – Core of NDB Cluster
• SQL Node
– MySQL Server Node
– NDB API Node
Xiongwei He (Grandis)
2010-10-29 17
grandis.he@gmail.com
18. Node Groups, Replica and Partition
• Data Node (NDB Node): The node running ndbd or ndbmtd
(multithread version) which stores a replica.
– Each Data Node in Data Group can handle traffic
– No conflicts for 2 phase commit since different nodes handle different data
• Replica: Copy of a cluster partition. The number of replicas is
equal to number of nodes per group
• Node Groups: A Node Group consists of 1-4 Data Nodes storing
same set of data for reliability. One cluster can have multiple data
groups
– NDB Node Number = Node Group Number * Number of Replica
• Partition: Automatically by Key and Linear Key, or to be defined by
user. It make data automatically distributed to different data
groups
Xiongwei He (Grandis)
2010-10-29 18
grandis.he@gmail.com
19. MySQL Cluster Replication
• Replication latency is little longer than TimesTen/
SolidDB due to Distributed Architecture
• Support 2-way replication but personally suggest:
– Use 2-way replication when the update operations to
cluster 1 and cluster 2 are using different keys (for
example, odd to cluster 1, even to cluster 2)
– Suggest only use 1-way replication for most
applications
• Conflict Resolution is NOT easy for complex scenarios
• Latency due to distributed architecture
Xiongwei He (Grandis)
2010-10-29 19
grandis.he@gmail.com
20. Commit, GCP, LCP
• Commit: commit to all the replicas (But in
memory only until GCP happen)
• Global Checkpoint (GCP): A GCP occurs every few
seconds, when transactions for all nodes are
synchronized and the redo-log is flush to disk
• Local Checkpoint (LCP). This is a checkpoint that
is specific to a single node. An LCP involves saving
all of a node's data to disk, and so usually occurs
every few minutes.
Xiongwei He (Grandis)
2010-10-29 20
grandis.he@gmail.com
21. Commit, GCP, LCP
• NDB GCP ~= Commit in Disk-based database
for Data Safety
• NDB LCP ~= Checkpoint in Disk-based
database
• LCP performance (full database flush)
– NOT good as Checkpoint in Disk-based database
or TimesTen which flush dirty pages only
– Mitigation: Distributed architecture to make disk
I/O reduced on single data node
Xiongwei He (Grandis)
2010-10-29 21
grandis.he@gmail.com
22. GCP and LCP for NDB Recovery
• NDB Recovery:
– Load LCP
– Load GCP
• Why need Global Synchronization for GCP: Make the whole
cluster data in consistence for recovery
– Lose committed transaction in memory for database
crash
• Mitigation for data safety: Use multiple replicas
“internal driving factors” for distributed architecture
with multiple replica support
Xiongwei He (Grandis)
2010-10-29 22
grandis.he@gmail.com
23. Why at least gigabit networking and
more latency than TimesTen
• Assume that there is 2 replica case
– NDB1/NDB2 – Paired Data Node in Data Group
– Transaction Coordinator (TC): The NDB node which SQL Node connected
• Update need 10 messages for MySQL App, 8 messages for NDB API App, Read take
5 messages for MySQL App, 3 messages for NDB API App
• Update for MySQL App
1. MySQL App (update statement) MySQL Server
2. MySQL Server(update statement) TC
3. TC (prepare message) NDB1
4. NDB1 (prepare message ) NDB2
5. NDB2 (prepare result) TC
6. TC (commit message) NDB2
7. NDB2 commit and send acknowledge NDB1
8. NDB1 commit and send acknowledge TC
9. TC send result MySQL Server
10. MySQL Server send result MySQL App
Xiongwei He (Grandis)
2010-10-29 23
grandis.he@gmail.com
24. Programming Interface
• Distributed nature using multiple MySQL Nodes and
NDB API Nodes
– Why: Transaction control by NDB Data Nodes
• The choice
– Standard MySQL Clients
– NDB API (Best way for high performance)
• Single Table Operation
• Can not access triggers but NDB Event
Xiongwei He (Grandis)
2010-10-29 24
grandis.he@gmail.com
25. Programming Interface
• Java Interface
– MySQL JDBC:Use Connnector/J 5.1.7+ for load balancing
support
– ClusterJ for Java -- Java interface based on NDBAPI
– ClusterJPA – OpenJPA Implementation which take
advantage of JDBC for complex query and ClusterJ for
single table operation
• LDAP Interface Support (Based on NDB API)
– Impressive Performance
– Data Store for OpenLDAP and OpenDS
Xiongwei He (Grandis)
2010-10-29 25
grandis.he@gmail.com
26. Your Options
• Using NDB API or NDB API originated interface
(Max Performance with several times
development cost)
• Using MySQL Interface (Best development
efficiency)
• LDAP (Depend on your application type)
Xiongwei He (Grandis)
2010-10-29 26
grandis.he@gmail.com
27. System Architecture Input
• Performance/Reliability Requirement
– System Volume/Throughput/Latency
– Disk Mirrored or NOT
– Redundancy Model
• Node Redundancy (Example: N+K (K=1 or 2) for SQL Node, 1 or 2 Data
Nodes in different data group is down)
• TCP/IP Redundancy (Ethernet port + WAN/LAN Network Redundancy)
– CPU budget for busy hours (related to redundancy mode)
• Memory/Disk per subscriber (per order)
– Memory usage per subscriber (per order)
– Disk usage per subscriber (per order) or Disk/Memory Rate
• Disk I/O Performance/Behavior (MySQL Cluster – LCP)
• Replication
– WAN Budget for Geo Redundancy
– Replication Daemon Throughput
– Replication Latency
Xiongwei He (Grandis)
2010-10-29 27
grandis.he@gmail.com
28. Your Adjustment
• Hardware Key Indicator
– CPU/Memory/Disk (Speed/Volume)
– Switch/Router
• Hardware/WAN Adjustment
– CPU/Memory/Disk
– NDB Node Number
– Network Configuration (LAN)
– Router and WAN bandwidth request
• Software Adjustment
– Data Model Design Adjustment
– Database Tuning
– Move “INSERT/DELETE” action to non-busy hours if possible
– Service/configuration data local caching (MySQL Cluster – using NDB
Event) to database access for small tables
– Others such as Optimized Software System Pattern/Design
Xiongwei He (Grandis)
2010-10-29 28
grandis.he@gmail.com
29. Hardware Environment
• Using VMWare for Test and even for functionality
demo to customers
• Using Non-ATCA Blade (IBM/HP/Oracle/Dell) or
ATCA for Performance Test
– 1G/10Gb Ethernet @ backplane
– NDB Node: Not necessary for multiple CPU but need
SAS Disk if insert/update/delete take significant
percentage in the transactions
– MySQL Cluster Node for replication: Do not turn on
Intel Hyper Thread or use Sun CMT CPU for fast
replication
Xiongwei He (Grandis)
2010-10-29 29
grandis.he@gmail.com
30. Upgrade
• MySQL Cluster without Cluster-Level
Replication
• MySQL Cluster with Cluster-Level Replication
Xiongwei He (Grandis)
2010-10-29 30
grandis.he@gmail.com
31. Single MySQL Cluster Upgrade
• 4 Data Groups (2 replicas), 2 MGM Node
– Stop front end application
– Backup the cluster data
– Split the node into 2 clusters
• Old cluster, cluster 1: MGM1, NDB1, NDB3, NDB5, NDB7
• New cluster, cluster 2: MGM2, NDB2, NDB4, NDB6, NDB8
– Upgrade NDB and schema of cluster 2
– Let front end application connect to cluster 2
– Change cluster 1 to be part of cluster 2
Xiongwei He (Grandis)
2010-10-29 31
grandis.he@gmail.com
32. MySQL Cluster Upgrade with
replication support
• Database version: NDB 7.1
– NDB 7.1 Feature: attribute promotion/demotion support for replication
– NDB 7.0 Feature: Default value for new adding columns in tables
• Environment: Cluster 1 (Active) and Cluster 2 (Standby Read-only)
• Enterprise Upgrade Procedure
– Upgrade on cluster 2 (NO Service Interrupt)
• Upgrade NDB version and apply schema changes
• Cluster1Cluster2 replication verification
– Switch cluster 2 to be active
• Cluster2Cluster1 replication verification
• If new version can NOT work, switch back cluster 1
– Upgrade on cluster 1 (Upgrade NDB version and apply application schema
changes)
• Carrier Upgrade Requirement – Be able to downgrade even 2
clusters are upgraded
2010-10-29 Xiongwei He (Grandis) 32
grandis.he@gmail.com
33. Limitation of MySQL Cluster
• NO Foreign Key Constraints
• Transaction Limitation
– NO Savepoints
– Read commit isolation level
• NO Durable Commits
– Mitigation: increase replica to 3 or 4 if you want
extreme reliability
Xiongwei He (Grandis)
2010-10-29 33
grandis.he@gmail.com
34. Take care in memory database
(Beyond MySQL Cluster)
• Possible slow startup in comparison to disk-based database
(impact to MTTR for single cluster case)
– Need load all the data to memory
– For MySQL Cluster: More Data Groups, Less time for startup
• Feature not fully deployed as disk-based database and but
improving
• Insure the query/update with key
– Please do not perform “select count(*) from $table_name” in
live machines if it is NOT tested. Find solution from database
dictionary
– If the application is designed for high performance/frequency,
Non-key search in transaction (if there is a bug) might make
database busy which will let application hung and trigger outage
Xiongwei He (Grandis)
2010-10-29 34
grandis.he@gmail.com