Without transactional tables, the global indexes can get easily out of sync with their data tables in Phoenix. Transactional tables require a separate transaction manager, have some restrictions and performance penalties, are still in beta. This technical talk lays out a design to have strongly consistent global indexes without the need for an external transaction manager. In addition to having strongly consistent indexing, the proposed design aims to have minimal impact on read performance, minimal code changes, and significant operational simplification by eliminating index rebuilds. Our implementation of the design and initial performance testing has been very promising towards achieving these goals.
In Phoenix, global indexing is implemented using a separate table for each secondary index of a table. Updating a table with one or more global index requires updating multiple table regions likely distributed over multiple region servers. Translating a single table update operation into a multi-table write operation poses consistency issues as Phoenix does not provide a reliable multi-table update capability without using transactional tables.
View this presentation to learn more...
2. Outline
● Background
● What is new for mutable global indexes
● What is new for immutable global indexes
● Correctness of the new approach
● Performance implications
3. Terminology
● Global - Indexed data is stored in a separate physical table from the base
table
● Immutable - Once data is written to the base table (and automatically
persisted to the index), no indexed column in a row will ever change (though it
may be deleted or age out due to a TTL setting)
● Mutable - Data can be freely changed.
● Mutation - Upserts and Deletes
4. Background - Global Mutable Indexes
Application Server
Application
Phoenix Client
HBase Client
Upsert /
Delete
Batch of
Mutations
Region
WAL
Region Server (for a
data table region)
1
HFile
Indexer
Region
WAL
Region Server (for an
index table region)
4
HFile
2
3
3
5. Background - Global Immutable Indexes
Application Server
Application
Phoenix Client
HBase Client
Upsert /
Delete
Batch of
Mutations
Region
WAL
Region Server (for a data table region)
Region Servers (for an index table
region)
HFile
Region
WAL HFile
6. Global Indexes Can Get Out-of-Sync Easily!
MUTABLE Global Indexes
1. Indexer goes through data table mutations
and prepares corresponding mutations for
index tables
1. Applies mutations to data table
1. Applies mutations on index table. -->
These are likely to be done remotely as
index table regions are likely to be on
other region servers. Likely to fail due to
RPC timeout, network, region server
failures, etc
Indexer for IMMUTABLE Global Indexes
1. Mutations are prepared on the client side
1. Data table and Index table mutations are
sent to region servers in parallel
1. There is no deterministic order in which
mutations are applied. Index and table can
get out of sync.
7. Consistent Global Index Design Objectives
● Global indexes should be always in sync with their data tables
● Consistency should not result in significant performance or latency impact
● Redesign should not require rewriting of existing Phoenix modules
● Consistent indexes should result in operational simplification by eliminating
index rebuilds
Phoenix JIRAs (PHOENIX-5156 and PHOENIX-5211)
8. Observations
● An index table row can always be reconstructed from the corresponding data
table row
● In HBase writes are fast -- we can add extra write phase without severely
impacting write performance
● Distributed two-phase commit protocols, i.e., transactions, are known to be
expensive. Existing solutions are in Beta.
9. New Design
● VERIFIED column on Index rows
● Reordered operations
● Extra write phase
10. Design Change for Mutable Global Indexes
Current Design
Write Path
● Update the data table
● Update the index tables (and
wish for the best)
Read Path
● Read the index rows (and
assume they are all good)
New Design
Write Path
● Update the index table rows with unverified status
● Update the data table
● Update the index table rows with verified status
Read Path
● Read the index rows and check their verify flag
● If a row is unverified, reconstruct the row from the
data table
11. Design Change for Immutable Global Indexes
Current Design
Write Path
● Update the data table and the index
tables in parallel (and wish for the
best)
Read Path
● Read the index rows (and assume
they are all good)
New Design (same as
Mutable)
Write Path
● Update the index tables rows with unverified
status
● Update the data table
● Update the index table rows with verified status
Read Path
● Read the index rows and check their verify flag
● If a row is unverified, reconstruct the row from
the data table
12. Global Mutable Indexes - Mutate
Application Server
Application
Phoenix Client
HBase Client
Upsert /
Delete
Batch of
Mutations
Region
WAL
Region Server (for a data
table region)
0
3
HFile
Indexer
1, 2, 4, 6, 8
5,
9
Region Server (for a
index table region)
Region
WAL HFile
Region Server (for a
index table region)
Region
WAL HFile
5,
9
7
13. Global Mutable Indexes Batch Example - Update
Data Table:
Pk C1 C2 C3
1 A X Y
Index (on C1, include C3):
Pk C3
A, 1 Y
Update C1 from A to B
1. Index tables are updated in parallel
Update - Put {{A, 1}, VERIFIED=false}
Insert - Put {{B, 1}, VERIFIED=false}
1. Data table write
2. Index tables set to verified/deleted
Delete {A, 1} ---> Delete is done in third phase so that if it
fails in first phase we can't recover without rebuild.
Put {{B, 1}, VERIFIED = true}
14. Global Mutable Indexes Batch Example - Delete
Data Table:
Pk C1 C2 C3
1 A X Y
Index (on C1, has C3):
Pk C3
A, 1 Y
Delete row with Pk = 1:
1. Index tables are updated in parallel)
Update - Put {{A, 1}, VERIFIED=false}
1. Delete data table row
Delete {1}
1. Delete index table row
Delete {A, 1}
15. Global Immutable Indexes - Mutate
Application Server
Application
Phoenix Client
HBase Client
Upsert /
Delete
Batch of
Mutations
Region
WAL
Region Server (for a data table region)
Region Servers (for an index table
region)
HFile
Region
WAL HFile
1,
3
2
2
1,
3
1,2,
3
16. Global Mutable & Immutable Indexes - Read
Application Server
Application
Phoenix Client
HBase Client
Select
Scan
Region
Region Server (for a data table
region)
Region Servers (for an index table
region)
HFile
Region
WAL HFile
2,
7
Region
HFileWAL
A Scan
Region
Observer
Global
Index
Checker
Ungroupped
Aggregate
Region
Observer
Indexer
0
1 3
4
5
5
6
6
6
17. Correctness - Without concurrent updates
● VERIFIED = true => index update happened after data table update
● VERIFIED = false => data is read from data table
● Missing index row cases: Not possible. Because
○ Index table is updated first before that the data table in strict order,
having the row in the data table implies that the index table update has
been attempted.
○ If the index update is failed then the data table update will not be
attempted and therefore, it is not possible to have a data table row but
not the corresponding index row because of index update failures.
○ Since an index row is deleted only after the corresponding data table row
is deleted, there cannot be missing row because data row deletes.
18. Correctness - With concurrent updates
● Detect it and not proceed with Phase 3
● Read-repair reconstructs index from the data table
19. Upgrade
● No schema change since the VERIFIED column is an existing empty column.
● It is advised to rebuild indexes after PHOENIX-5156 to make sure that Index
is always consistent for both old and new data.
20. Performance
Preliminary results:
● Increase in 25% in write latency
● No noticeable increase in read latency
Test Env:
● Data table with two indexes.
● 200K large rows on data table.
● 10 node AWS cluster
○ 4 core nodes, 2.3 Ghz, 10 GB disk, 32 GB memory VMs
0. SQL upsert/delete operations committed and translated to HBase operations
1.preBatchMutate hook of the Index coprocessor on one of these region servers acquires the locks for the rows in its batch
2. Concurrent mutations are identified thru timestamp and row key
3. Lastest row value is read and index mutations prepared (see next slide)
4. Release locks
5. Update indexes in parallel with Verified=false -> If fails, return fail
6. Locks data table rows
7. Updates data table -> If fails return fail, rollback
8. Release locks
9. Updates index table with Verified=true -> If fails don’t fail
In parallel, update indexes with Verified=false
Update data table
Update indexes with Verified=true
0. SQL select operation is converted to hbase scan. The region scanner for this scan operation is wrapped by a scanner implemented by the GlobalIndexChecker coprocessor in the postScannerOpen hook.
1. A scan region observer starts calling the next operation on the GlobalIndexChecker scanner to scan rows one by one
2. If VERIFIED = true, returns the index row
3. VERIFIED=false, rebuilds index row using UngrouppedAggregateRegionObserver
4. Index build (same as before)
5. Index build (same as before) Reads data row to prepare batches
6. Set VERIFIED = true
7. Return row in scan