SlideShare a Scribd company logo
1 of 22
STRONGLY CONSISTENT
GLOBAL INDEXES for
Nontransactional Tables
Designed by: Kadir Ozdemir
Presenter: Gokcen Iskender
Outline
● Background
● What is new for mutable global indexes
● What is new for immutable global indexes
● Correctness of the new approach
● Performance implications
Terminology
● Global - Indexed data is stored in a separate physical table from the base
table
● Immutable - Once data is written to the base table (and automatically
persisted to the index), no indexed column in a row will ever change (though it
may be deleted or age out due to a TTL setting)
● Mutable - Data can be freely changed.
● Mutation - Upserts and Deletes
Background - Global Mutable Indexes
Application Server
Application
Phoenix Client
HBase Client
Upsert /
Delete
Batch of
Mutations
Region
WAL
Region Server (for a
data table region)
1
HFile
Indexer
Region
WAL
Region Server (for an
index table region)
4
HFile
2
3
3
Background - Global Immutable Indexes
Application Server
Application
Phoenix Client
HBase Client
Upsert /
Delete
Batch of
Mutations
Region
WAL
Region Server (for a data table region)
Region Servers (for an index table
region)
HFile
Region
WAL HFile
Global Indexes Can Get Out-of-Sync Easily!
MUTABLE Global Indexes
1. Indexer goes through data table mutations
and prepares corresponding mutations for
index tables
1. Applies mutations to data table
1. Applies mutations on index table. -->
These are likely to be done remotely as
index table regions are likely to be on
other region servers. Likely to fail due to
RPC timeout, network, region server
failures, etc
Indexer for IMMUTABLE Global Indexes
1. Mutations are prepared on the client side
1. Data table and Index table mutations are
sent to region servers in parallel
1. There is no deterministic order in which
mutations are applied. Index and table can
get out of sync.
Consistent Global Index Design Objectives
● Global indexes should be always in sync with their data tables
● Consistency should not result in significant performance or latency impact
● Redesign should not require rewriting of existing Phoenix modules
● Consistent indexes should result in operational simplification by eliminating
index rebuilds
Phoenix JIRAs (PHOENIX-5156 and PHOENIX-5211)
Observations
● An index table row can always be reconstructed from the corresponding data
table row
● In HBase writes are fast -- we can add extra write phase without severely
impacting write performance
● Distributed two-phase commit protocols, i.e., transactions, are known to be
expensive. Existing solutions are in Beta.
New Design
● VERIFIED column on Index rows
● Reordered operations
● Extra write phase
Design Change for Mutable Global Indexes
Current Design
Write Path
● Update the data table
● Update the index tables (and
wish for the best)
Read Path
● Read the index rows (and
assume they are all good)
New Design
Write Path
● Update the index table rows with unverified status
● Update the data table
● Update the index table rows with verified status
Read Path
● Read the index rows and check their verify flag
● If a row is unverified, reconstruct the row from the
data table
Design Change for Immutable Global Indexes
Current Design
Write Path
● Update the data table and the index
tables in parallel (and wish for the
best)
Read Path
● Read the index rows (and assume
they are all good)
New Design (same as
Mutable)
Write Path
● Update the index tables rows with unverified
status
● Update the data table
● Update the index table rows with verified status
Read Path
● Read the index rows and check their verify flag
● If a row is unverified, reconstruct the row from
the data table
Global Mutable Indexes - Mutate
Application Server
Application
Phoenix Client
HBase Client
Upsert /
Delete
Batch of
Mutations
Region
WAL
Region Server (for a data
table region)
0
3
HFile
Indexer
1, 2, 4, 6, 8
5,
9
Region Server (for a
index table region)
Region
WAL HFile
Region Server (for a
index table region)
Region
WAL HFile
5,
9
7
Global Mutable Indexes Batch Example - Update
Data Table:
Pk C1 C2 C3
1 A X Y
Index (on C1, include C3):
Pk C3
A, 1 Y
Update C1 from A to B
1. Index tables are updated in parallel
Update - Put {{A, 1}, VERIFIED=false}
Insert - Put {{B, 1}, VERIFIED=false}
1. Data table write
2. Index tables set to verified/deleted
Delete {A, 1} ---> Delete is done in third phase so that if it
fails in first phase we can't recover without rebuild.
Put {{B, 1}, VERIFIED = true}
Global Mutable Indexes Batch Example - Delete
Data Table:
Pk C1 C2 C3
1 A X Y
Index (on C1, has C3):
Pk C3
A, 1 Y
Delete row with Pk = 1:
1. Index tables are updated in parallel)
Update - Put {{A, 1}, VERIFIED=false}
1. Delete data table row
Delete {1}
1. Delete index table row
Delete {A, 1}
Global Immutable Indexes - Mutate
Application Server
Application
Phoenix Client
HBase Client
Upsert /
Delete
Batch of
Mutations
Region
WAL
Region Server (for a data table region)
Region Servers (for an index table
region)
HFile
Region
WAL HFile
1,
3
2
2
1,
3
1,2,
3
Global Mutable & Immutable Indexes - Read
Application Server
Application
Phoenix Client
HBase Client
Select
Scan
Region
Region Server (for a data table
region)
Region Servers (for an index table
region)
HFile
Region
WAL HFile
2,
7
Region
HFileWAL
A Scan
Region
Observer
Global
Index
Checker
Ungroupped
Aggregate
Region
Observer
Indexer
0
1 3
4
5
5
6
6
6
Correctness - Without concurrent updates
● VERIFIED = true => index update happened after data table update
● VERIFIED = false => data is read from data table
● Missing index row cases: Not possible. Because
○ Index table is updated first before that the data table in strict order,
having the row in the data table implies that the index table update has
been attempted.
○ If the index update is failed then the data table update will not be
attempted and therefore, it is not possible to have a data table row but
not the corresponding index row because of index update failures.
○ Since an index row is deleted only after the corresponding data table row
is deleted, there cannot be missing row because data row deletes.
Correctness - With concurrent updates
● Detect it and not proceed with Phase 3
● Read-repair reconstructs index from the data table
Upgrade
● No schema change since the VERIFIED column is an existing empty column.
● It is advised to rebuild indexes after PHOENIX-5156 to make sure that Index
is always consistent for both old and new data.
Performance
Preliminary results:
● Increase in 25% in write latency
● No noticeable increase in read latency
Test Env:
● Data table with two indexes.
● 200K large rows on data table.
● 10 node AWS cluster
○ 4 core nodes, 2.3 Ghz, 10 GB disk, 32 GB memory VMs
Resources
Phoenix Secondary Indexing:
http://phoenix.apache.org/secondary_indexing.html
PHOENIX-5018, PHOENIX-5190, PHOENIX-5156, PHOENIX-5211
Design doc:
https://docs.google.com/document/d/1Vsf23GCT0_CK4q8g_xaXyE_4Dw
3aH71BfZypEy3T9iQ/edit?usp=sharing
kozdemir@salesforce.com
Thank You!

More Related Content

More from DataWorks Summit

Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...DataWorks Summit
 
Applying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsApplying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsDataWorks Summit
 
Open Source, Open Data: Driving Innovation in Smart Cities
Open Source, Open Data: Driving Innovation in Smart CitiesOpen Source, Open Data: Driving Innovation in Smart Cities
Open Source, Open Data: Driving Innovation in Smart CitiesDataWorks Summit
 
Data Protection in Hybrid Enterprise Data Lake Environment
Data Protection in Hybrid Enterprise Data Lake EnvironmentData Protection in Hybrid Enterprise Data Lake Environment
Data Protection in Hybrid Enterprise Data Lake EnvironmentDataWorks Summit
 
Big Data Technologies in Support of a Medical School Data Science Institute
Big Data Technologies in Support of a Medical School Data Science InstituteBig Data Technologies in Support of a Medical School Data Science Institute
Big Data Technologies in Support of a Medical School Data Science InstituteDataWorks Summit
 
Hadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native EraHadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native EraDataWorks Summit
 
Free Servers to Build Big Data System on: Bing’s Approach
Free Servers to Build Big Data System on: Bing’s ApproachFree Servers to Build Big Data System on: Bing’s Approach
Free Servers to Build Big Data System on: Bing’s ApproachDataWorks Summit
 
IoFMT – Internet of Fleet Management Things
IoFMT – Internet of Fleet Management ThingsIoFMT – Internet of Fleet Management Things
IoFMT – Internet of Fleet Management ThingsDataWorks Summit
 

More from DataWorks Summit (20)

Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
Transforming and Scaling Large Scale Data Analytics: Moving to a Cloud-based ...
 
Applying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real ProblemsApplying Noisy Knowledge Graphs to Real Problems
Applying Noisy Knowledge Graphs to Real Problems
 
Open Source, Open Data: Driving Innovation in Smart Cities
Open Source, Open Data: Driving Innovation in Smart CitiesOpen Source, Open Data: Driving Innovation in Smart Cities
Open Source, Open Data: Driving Innovation in Smart Cities
 
Data Protection in Hybrid Enterprise Data Lake Environment
Data Protection in Hybrid Enterprise Data Lake EnvironmentData Protection in Hybrid Enterprise Data Lake Environment
Data Protection in Hybrid Enterprise Data Lake Environment
 
Big Data Technologies in Support of a Medical School Data Science Institute
Big Data Technologies in Support of a Medical School Data Science InstituteBig Data Technologies in Support of a Medical School Data Science Institute
Big Data Technologies in Support of a Medical School Data Science Institute
 
Hadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native EraHadoop Storage in the Cloud Native Era
Hadoop Storage in the Cloud Native Era
 
Free Servers to Build Big Data System on: Bing’s Approach
Free Servers to Build Big Data System on: Bing’s ApproachFree Servers to Build Big Data System on: Bing’s Approach
Free Servers to Build Big Data System on: Bing’s Approach
 
IoFMT – Internet of Fleet Management Things
IoFMT – Internet of Fleet Management ThingsIoFMT – Internet of Fleet Management Things
IoFMT – Internet of Fleet Management Things
 

Recently uploaded

2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditSkynet Technologies
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 

Recently uploaded (20)

2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Manual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance AuditManual 508 Accessibility Compliance Audit
Manual 508 Accessibility Compliance Audit
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 

Strongly Consistent Global Indexes for Phoenix

  • 1. STRONGLY CONSISTENT GLOBAL INDEXES for Nontransactional Tables Designed by: Kadir Ozdemir Presenter: Gokcen Iskender
  • 2. Outline ● Background ● What is new for mutable global indexes ● What is new for immutable global indexes ● Correctness of the new approach ● Performance implications
  • 3. Terminology ● Global - Indexed data is stored in a separate physical table from the base table ● Immutable - Once data is written to the base table (and automatically persisted to the index), no indexed column in a row will ever change (though it may be deleted or age out due to a TTL setting) ● Mutable - Data can be freely changed. ● Mutation - Upserts and Deletes
  • 4. Background - Global Mutable Indexes Application Server Application Phoenix Client HBase Client Upsert / Delete Batch of Mutations Region WAL Region Server (for a data table region) 1 HFile Indexer Region WAL Region Server (for an index table region) 4 HFile 2 3 3
  • 5. Background - Global Immutable Indexes Application Server Application Phoenix Client HBase Client Upsert / Delete Batch of Mutations Region WAL Region Server (for a data table region) Region Servers (for an index table region) HFile Region WAL HFile
  • 6. Global Indexes Can Get Out-of-Sync Easily! MUTABLE Global Indexes 1. Indexer goes through data table mutations and prepares corresponding mutations for index tables 1. Applies mutations to data table 1. Applies mutations on index table. --> These are likely to be done remotely as index table regions are likely to be on other region servers. Likely to fail due to RPC timeout, network, region server failures, etc Indexer for IMMUTABLE Global Indexes 1. Mutations are prepared on the client side 1. Data table and Index table mutations are sent to region servers in parallel 1. There is no deterministic order in which mutations are applied. Index and table can get out of sync.
  • 7. Consistent Global Index Design Objectives ● Global indexes should be always in sync with their data tables ● Consistency should not result in significant performance or latency impact ● Redesign should not require rewriting of existing Phoenix modules ● Consistent indexes should result in operational simplification by eliminating index rebuilds Phoenix JIRAs (PHOENIX-5156 and PHOENIX-5211)
  • 8. Observations ● An index table row can always be reconstructed from the corresponding data table row ● In HBase writes are fast -- we can add extra write phase without severely impacting write performance ● Distributed two-phase commit protocols, i.e., transactions, are known to be expensive. Existing solutions are in Beta.
  • 9. New Design ● VERIFIED column on Index rows ● Reordered operations ● Extra write phase
  • 10. Design Change for Mutable Global Indexes Current Design Write Path ● Update the data table ● Update the index tables (and wish for the best) Read Path ● Read the index rows (and assume they are all good) New Design Write Path ● Update the index table rows with unverified status ● Update the data table ● Update the index table rows with verified status Read Path ● Read the index rows and check their verify flag ● If a row is unverified, reconstruct the row from the data table
  • 11. Design Change for Immutable Global Indexes Current Design Write Path ● Update the data table and the index tables in parallel (and wish for the best) Read Path ● Read the index rows (and assume they are all good) New Design (same as Mutable) Write Path ● Update the index tables rows with unverified status ● Update the data table ● Update the index table rows with verified status Read Path ● Read the index rows and check their verify flag ● If a row is unverified, reconstruct the row from the data table
  • 12. Global Mutable Indexes - Mutate Application Server Application Phoenix Client HBase Client Upsert / Delete Batch of Mutations Region WAL Region Server (for a data table region) 0 3 HFile Indexer 1, 2, 4, 6, 8 5, 9 Region Server (for a index table region) Region WAL HFile Region Server (for a index table region) Region WAL HFile 5, 9 7
  • 13. Global Mutable Indexes Batch Example - Update Data Table: Pk C1 C2 C3 1 A X Y Index (on C1, include C3): Pk C3 A, 1 Y Update C1 from A to B 1. Index tables are updated in parallel Update - Put {{A, 1}, VERIFIED=false} Insert - Put {{B, 1}, VERIFIED=false} 1. Data table write 2. Index tables set to verified/deleted Delete {A, 1} ---> Delete is done in third phase so that if it fails in first phase we can't recover without rebuild. Put {{B, 1}, VERIFIED = true}
  • 14. Global Mutable Indexes Batch Example - Delete Data Table: Pk C1 C2 C3 1 A X Y Index (on C1, has C3): Pk C3 A, 1 Y Delete row with Pk = 1: 1. Index tables are updated in parallel) Update - Put {{A, 1}, VERIFIED=false} 1. Delete data table row Delete {1} 1. Delete index table row Delete {A, 1}
  • 15. Global Immutable Indexes - Mutate Application Server Application Phoenix Client HBase Client Upsert / Delete Batch of Mutations Region WAL Region Server (for a data table region) Region Servers (for an index table region) HFile Region WAL HFile 1, 3 2 2 1, 3 1,2, 3
  • 16. Global Mutable & Immutable Indexes - Read Application Server Application Phoenix Client HBase Client Select Scan Region Region Server (for a data table region) Region Servers (for an index table region) HFile Region WAL HFile 2, 7 Region HFileWAL A Scan Region Observer Global Index Checker Ungroupped Aggregate Region Observer Indexer 0 1 3 4 5 5 6 6 6
  • 17. Correctness - Without concurrent updates ● VERIFIED = true => index update happened after data table update ● VERIFIED = false => data is read from data table ● Missing index row cases: Not possible. Because ○ Index table is updated first before that the data table in strict order, having the row in the data table implies that the index table update has been attempted. ○ If the index update is failed then the data table update will not be attempted and therefore, it is not possible to have a data table row but not the corresponding index row because of index update failures. ○ Since an index row is deleted only after the corresponding data table row is deleted, there cannot be missing row because data row deletes.
  • 18. Correctness - With concurrent updates ● Detect it and not proceed with Phase 3 ● Read-repair reconstructs index from the data table
  • 19. Upgrade ● No schema change since the VERIFIED column is an existing empty column. ● It is advised to rebuild indexes after PHOENIX-5156 to make sure that Index is always consistent for both old and new data.
  • 20. Performance Preliminary results: ● Increase in 25% in write latency ● No noticeable increase in read latency Test Env: ● Data table with two indexes. ● 200K large rows on data table. ● 10 node AWS cluster ○ 4 core nodes, 2.3 Ghz, 10 GB disk, 32 GB memory VMs
  • 21. Resources Phoenix Secondary Indexing: http://phoenix.apache.org/secondary_indexing.html PHOENIX-5018, PHOENIX-5190, PHOENIX-5156, PHOENIX-5211 Design doc: https://docs.google.com/document/d/1Vsf23GCT0_CK4q8g_xaXyE_4Dw 3aH71BfZypEy3T9iQ/edit?usp=sharing kozdemir@salesforce.com

Editor's Notes

  1. High level comparison
  2. 0. SQL upsert/delete operations committed and translated to HBase operations 1.preBatchMutate hook of the Index coprocessor on one of these region servers acquires the locks for the rows in its batch 2. Concurrent mutations are identified thru timestamp and row key 3. Lastest row value is read and index mutations prepared (see next slide) 4. Release locks 5. Update indexes in parallel with Verified=false -> If fails, return fail 6. Locks data table rows 7. Updates data table -> If fails return fail, rollback 8. Release locks 9. Updates index table with Verified=true -> If fails don’t fail
  3. In parallel, update indexes with Verified=false Update data table Update indexes with Verified=true
  4. 0. SQL select operation is converted to hbase scan. The region scanner for this scan operation is wrapped by a scanner implemented by the GlobalIndexChecker coprocessor in the postScannerOpen hook. 1. A scan region observer starts calling the next operation on the GlobalIndexChecker scanner to scan rows one by one 2. If VERIFIED = true, returns the index row 3. VERIFIED=false, rebuilds index row using UngrouppedAggregateRegionObserver 4. Index build (same as before) 5. Index build (same as before) Reads data row to prepare batches 6. Set VERIFIED = true 7. Return row in scan