SlideShare a Scribd company logo
1 of 50
Download to read offline
Postgres-XC: Write-Scalable
    PostgreSQL Cluster


              Mason Sharp

           August 7th, 2012

  CC License: Attribution-NonCommercial-ShareAlike
Content Attribution
• Koichi Suzuki
• Michael Paquier
• Ashutosh Bapat
• Pavan Deolasee
• Mason Sharp
• ...?

Aug 7, 2012                         2
Who am I
    ●   Mason Sharp
    ●   Co-organizer of NYC PUG
    ●   Co-founder of StormDB
    ●   Previously worked at EnterpriseDB
    ●   Original architect of Stado (GridSQL)
    ●   One of the original architects of Postgres-XC




Aug 7, 2012                     Postgres-XC             3
PostgreSQL User Groups




        San Francisco                   New York
         616 Members                    502 Members



                                          New:
                                          Philadelphia
                                          Los Angeles
                        Tokyo
                        2000? Members

Aug 7, 2012             Postgres-XC                      4
NYC PUG Meetup Membership




Aug 7, 2012         Postgres-XC       5
NYC PUG Speakers
    ●   Recent speakers include
         ●    Bruce Momjian
         ●    Greg Smith
         ●    Greg Stark
         ●    Joe Conway
         ●    Joachim Wieland




Aug 7, 2012                     Postgres-XC   6
NYC PUG Speakers
                We want you!




Aug 7, 2012          Postgres-XC   7
Postges-XC Talk
●   Background
●   Postgres-XC Introduction & Usage
●   Postgres-XC Components
●   Postgres-XC Details




                                       8
Background




Aug 7, 2012      Postgres-XC   9
Data Tier Scaling
    ●   Up versus Out
         ●    More memory, more cores
    ●   Read-only Replicated Slaves
    ●   Caching
         ●    Memcached
    ●   Sharding
    ●   NoSQL
    ●   NewSQL




Aug 7, 2012                     Postgres-XC   10
XC Origins




        Koichi Suzuki, NTT Data            Mason Sharp



Aug 7, 2012                       Postgres-XC            11
PostgreSQL-Related Clustering
             Projects
    ●   pgpool-II
         ●    Read replicated slaves
    ●   PL/Proxy
         ●    Used by Skype, meetme (myYearbook)
         ●    All access is over a stored function
    ●   Postgres-R, PostgresForest
    ●   Stado (GridSQL)
         ●    Parallel Query             Can we make it write scalable?
         ●    Not write-scalable



Aug 7, 2012                         Postgres-XC                           12
Postgres-XC Introduction




Aug 7, 2012            Postgres-XC       13
Overview
    ●   PostgreSQL-based database cluster
         ●    Same API to Apps as PostgreSQL
               –   Same drivers
         ●    Currently based upon PG 9.1. Soon: 9.2.
    ●   Symmetric Multi-headed Cluster
         ●    No master, no slave
               –   Not just PostgreSQL replication.
               –   Application can read/write to any coordinator server
         ●    Consistent database view to all the transactions
               –   Complete ACID property to all the transactions in the cluster
    ●   Scales both for Write and Read

Aug 7, 2012                              Postgres-XC                               14
Postgres-XC Cluster
                            Application can connect to any server to have the same database view and service
                                                                                                           .




      PG- XC Server            PG- XC Server               PG- XC Server                                         PG- XC Server


              Coordinator                Coordinator               Coordinator            ・・・
                                                                                           ・・                            Coordinator



               Data Node                 Data Node                 Data Node             Add PG- XC servers as           Data Node
                                                                                               needed




                                                    Communication among PG- XC servers




                                      Global Transaction
                                          Manager
                                                                           GTM




Aug 7, 2012                                                  Postgres-XC                                                               15
Read/Write Scalability
              DBT-1 throughput scalability




Aug 7, 2012                       Postgres-XC   16
I
   Consistency
Aug 7, 2012       Postgres-XC   17
Is XC right for you?
    ●   I need write scalability
    ●   I like ACID
    ●   I like SQL
    ●   I don't want to rewrite my existing SQL
        applications
    ●   I want to leverage the PostgreSQL community
        for all of their contrib modules



Aug 7, 2012                 Postgres-XC           18
Why XC may not be right for you
    ●   I need MPP parallel query capability
         ● Parallel Query in XC Limited
         ● Try Stado: www.stado.us


    ●   I need a solution with built-in HA
    ●   I need massive scale and have loose
        consistency requirements
    ●   I would rather use a NoSQL solution so I can
        put it on my resume


Aug 7, 2012                Postgres-XC                 19
Postgres-XC Components




Aug 7, 2012            Postgres-XC     20
Aug 7, 2012   Postgres-XC   21
Coordinator Overview
●
    Based on PostgreSQL 9.1 (9.2 soon)
●
    Accepts connections from clients
●
    Parses and plans requests
●
    Interacts with Global Transaction Manager
●
    Uses pooler for Data Node connections
●
    Sends down XIDs and snapshots to Data
    Nodes
●
    Collects results and returns to client
●
    Uses two phase commit if necessary

                                                22
Data Node Overview
●
    Based on PostgreSQL 9.1 (9.2 soon)
●
    Where user created data is actually
    stored
●
    Coordinators (not clients) connects to
    Data Nodes
●
    Accepts XID and snapshots from
    Coordinator
●
    The rest is fairly similar to vanilla
    PostgreSQL


                                             23
Global Transaction Manager



     GTM                     Cluster nodes




  XID
  Snapshot
  Timestamp
  Sequence values



Aug 7, 2012          Postgres-XC             24
Summary
   ●   Coordinator
        ●     Visible to apps                                   Postgres-XC core, based upon
                                                                vanilla PostgreSQL
        ●     SQL analysis, planning, execution
        ●     Connection pooling                                Share same binary
   ●   Datanode (or simply “NODE”)                              May want to colocate
        ●     Actual database store
        ●     Local SQL execution
   ●   GTM (Global Transaction Manager)
        ●     Provides consistent database view to transactions
               –   GXID (Global Transaction ID)
               –   Snapshot (List of active transactions)                       Different binaries
               –   Other global values such as SEQUENCE
   ●   GTM Proxy, integrates server-local transaction requirement for performance


Aug 7, 2012                                       Postgres-XC                                        25
Data Distribution


              Distribution Strategies




Aug 7, 2012           Postgres-XC       26
Distributing the data
 ●   Replicated table
      ●   Each row in the table is replicated to the datanodes
      ●   Statement based replication
 ●   Distributed table
      ●   Each row of the table is stored on one datanode,
          decided by one of following strategies
              –   Hash
              –   Round Robin
              –   Modulo
              –   Range and user defined function (future)

Aug 7, 2012                          Postgres-XC             27
Table Distribution and Replication

 ●   Each table can be distributed or replicated
      ●   Strategy based on usage
              –   Transaction tables → Distributed
              –   Static lookup tables → Replicate
              –   Distribute parent-children together
      ●   Join pushdown when possible
      ●   Where clause pushdown
      ●   Simple parallel aggregates



Aug 7, 2012                           Postgres-XC       28
Defining Tables
 ●   Table Distribution/Replication
      ●   CREATE TABLE tab (…) DISTRIBUTE BY
            HASH(col) | MODULO(col) | ROUND
          ROBIN | REPLICATION




Aug 7, 2012              Postgres-XC           29
Replicated Tables
                                                                                 Reads
                         Writes




                                                                          read
                   write write write




                                                             val   val2          val   val2   val   val2
      val   val2        val   val2     val   val2
                                                             1     2             1     2      1     2
      1     2           1     2        1     2
                                                             2     10            2     10     2     10
      2     10          2     10       2     10
                                                             3     4             3     4      3     4
      3     4           3     4        3     4




Aug 7, 2012                                         Postgres-XC                                            30
Distributed Tables
                              Write                                               Read

                                                                                Combiner



                     write
                                                                             read read read




      val     val2           val   val2   val   val2            val   val2                    val   val2
                                                                                 val   val2

      1       2              11    21     10    20
                                                                1     2          11    21     10    20

      2       10             21    101    20    100             2     10                      20    100
                                                                                 21    101
      3       4              31    41     30    40              3     4          31    41     30    40




Aug 7, 2012                                            Postgres-XC                                         31
Join Pushdown
              Hash/Module              Round Robin               Replicated
              distributed



Hash/Modulo   Inner join with       NO                           Inner join if replicated
distributed   equality condition on                              table's distribution list
              the distribution                                   is superset of
              column with same                                   distributed table's
              data type and same                                 distribution list
              distribution strategy
Round Robin   No                    No                            Inner join if replicated
                                                                  table's distribution list
                                                                  is superset of
                                                                  distributed table's
                                                                  distribution list
Replicated    Inner join if replicated Inner join if replicated All kinds of joins
              table's distribution list table's distribution list
              is superset of            is superset of
              distributed table's       distributed table's
              distribution list         distribution list
Aug 7, 2012                      Postgres-XC                                             32
Constraints
 ●   XC does not support Global constraints – i.e.
     constraints across datanodes
 ●   Constraints within a datanode are supported
 Distribution strategy     Unique, primary key           Foreign key constraints
                           constraints

 Replicated                Supported                      Supported if the referenced
                                                          table is also replicated on
                                                          the same nodes
 Hash/Modulo distributed   Supported if primary OR        Supported if the referenced
                           unique key is distribution key table is replicated on same
                                                          nodes OR it's distributed by
                                                          primary key in the same
                                                          manner and same nodes
 Round Robin               Not supported                  Supported if the referenced
                                                          table is replicated on same
                                                          nodes

Aug 7, 2012                         Postgres-XC                                      33
Demo




Aug 7, 2012   Postgres-XC   34
Transaction Management


          Why MVCC is Important for Consistency
               Global Transaction Manger



Aug 7, 2012              Postgres-XC              35
Multi-version Concurrency Control
          (MVCC) (quick overview)
 ●   Readers do not block writers
 ●   Writers do not block readers
 ●   Transaction Ids (XIDs)
      ●   Every transaction gets an ID
 ●   Snapshots contain a list of running XIDs




Aug 7, 2012                   Postgres-XC       36
Multi-version Concurrency Control
          (MVCC) (quickly discussed)
     Example:
T1 Begin...
T2            Begin; INSERT...; Commit
T3                Begin...
T4                                          Begin; SELECT


 ●   T4's snapshot contains T1 and T3
      ●   T2 already committed
      ●   It can see T2's commits, but not T1's nor T3's

Aug 7, 2012                   Postgres-XC                   37
Multi-version Concurrency Control
          (MVCC) on 2 Independent Nodes
     Example:
T1 Begin...
T2            Begin; INSERT..;   Commit;
T3               Begin...
T4                          Begin; SELECT

 ●   Node 1: T2 Commit, T4 SELECT
 ●   Node 2: T4 SELECT, T2 Commit
 ●   T4's SELECT statement returns inconsistent data
      ●   Includes data from Node1, but not Node2.
      ●   C in ACID Fails

Aug 7, 2012                             Postgres-XC    38
Global Transaction Manager
          (GTM)
   ●   Provides Global Transaction Consistency



     GTM                        Cluster nodes




  XID
  Snapshot
  Timestamp
  Sequence values



Aug 7, 2012             Postgres-XC              39
Transaction Management
●   2PC is used to guarantee transactional consistency
    across nodes
    ●   When there are more than one nodes involved OR
    ●   When there are explicit 2PC transactions
●   Only those nodes where write activity has happened,
    participate in 2PC
●   In PostgreSQL 2PC can not be applied if temporary
    tables are involved. Same restriction applies in
    Postgres-XC
●    When single coordinator command needs multiple
     datanode commands, we encase those in transaction
     block
Aug 7, 2012               Postgres-XC                  40
Postgres-XC Considerations




Aug 7, 2012              Postgres-XC       41
Can GTM be a Performance Bottleneck?
 • Depending on implementation
                  – Current Implementation              Coordinators
GTM


                      GTM Threads                       Coordinator Backend
  Snapshot Data




                                        Domain Socket

                                                                                                        Applicable up to




                                                                 Client Library




                                                                                         Coordinator
                                           Internet




                   Lock                                                                                five PG-XC




                                                                                  Call
                                                                                                       servers (DBT-1)

                     Create Terminate

                  GTM Main Thread




                  – Large snapshot size and number
                  – Too many interaction between GTM and Coordinators


July 12th, 2012                                                                                                    42
Can GTM be a Performance Bottleneck?
Proxy Implementation                                                                                    Coordinators

                        GTM


                                     GTM Worker Threads                                                        GTM Proxy Thread                                         Coordinator Backend
        Snapshot Data




                                          GTM Snapshot Handler




                                                                        GTM Server Scanner




                                                                                                                       Server Protocol Handler




                                                                                                                                                  Command
                                                                                                                                                   Backend

                                                                                                                                                   Handler




                                                                                                                                                                                Client Library
                                                                                             Internet




                                                                                                                                                                                                        Coordinator
                                                                                             Domain
                                                                                             Socket




                                                                                                                                                               Domain
                                                                                                                                                               Socket
                                                                 Call




                                                                                                                                                                Unix
                              Lock




                                                                                                                                                                                                 Call
                                                                                                                                                  Response
                                                                                                                                                   Backend

                                                                                                                                                   Handler
                                       Create Terminate                                                        Create                             Connection
                                                                                                              Terminate                          Assignment

                                     GTM Main Thread                                                               Proxy Main Thread
                                                                                                                                                                        Connection



•Request/Response grouping
•Single representative snapshot applied to multiple transactions


July 12th, 2012                                                                                                                                                                                                       43
Can GTM be a SPOF?
• Implement GTM Standby

                         Checkpoint next starting
                        point (GXID and Sequence)



           GTM Master                                   GTM Standby

                                                    Standby can failover the
                                                    master without referring to
                                                    GTM master information.




July 12th, 2012                                                                   44
Parallel Query
 ●   OK for simple queries
     ●   Also when all joins can be pushed down
              –   Star schema with replicated dimensions
 ●   Even aggregates
     ●   SELECT SUM(col1) FROM tab1
 ●   If cross-node join needed performs poorly
     ●   Data on one node needs to join with another
     ●   Ships all data to coordinator for joining



Aug 7, 2012                          Postgres-XC           45
High Availability
 ●   GTM-standby provides basic HA
 ●   No native HA for nodes
      ●   Use HA middleware such as Pacemaker
 ●   Each data node should be configured with
     synchronous replication




Aug 7, 2012                Postgres-XC          46
Status



              Settings and options



Aug 7, 2012          Postgres-XC     47
Present Status
 ●   Project/Developer site
      ●   http://postgres-xc.sourceforge.net/
      ●   http://sourceforge.net/projects/postgres-xc/
 ●   Version 1.0 available
      ●   Base PostgreSQL version: 9.1
      ●   Soon, PostgreSQL 9.2!
              –   Group commit: even more write scalability
              –   “Index-only Scans”
 ●   Get Involved
      ●   Even as just a tester
Aug 7, 2012                          Postgres-XC              48
Easy way of trying it out?
 ●   www.stormdb.com
      ●   Not Postgres-XC, but similar
      ●   Nothing to install, cloud hosted
      ●   Free beta




Aug 7, 2012                   Postgres-XC    49
Thank You


              mason@stormdb.com
              Twitter: mason_db



Aug 7, 2012        Postgres-XC    50

More Related Content

What's hot

PGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan Pachenko
PGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan PachenkoPGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan Pachenko
PGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan PachenkoEqunix Business Solutions
 
Managing terabytes: When Postgres gets big
Managing terabytes: When Postgres gets bigManaging terabytes: When Postgres gets big
Managing terabytes: When Postgres gets bigSelena Deckelmann
 
Overview of some popular distributed databases
Overview of some popular distributed databasesOverview of some popular distributed databases
Overview of some popular distributed databasessagar chaturvedi
 
Red Hat Gluster Storage - Direction, Roadmap and Use-Cases
Red Hat Gluster Storage - Direction, Roadmap and Use-CasesRed Hat Gluster Storage - Direction, Roadmap and Use-Cases
Red Hat Gluster Storage - Direction, Roadmap and Use-CasesRed_Hat_Storage
 
Gluster.community.day.2013
Gluster.community.day.2013Gluster.community.day.2013
Gluster.community.day.2013Udo Seidel
 
Red Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFSRed Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFSGlusterFS
 
The Future of GlusterFS and Gluster.org
The Future of GlusterFS and Gluster.orgThe Future of GlusterFS and Gluster.org
The Future of GlusterFS and Gluster.orgJohn Mark Walker
 
Storage as a Service with Gluster
Storage as a Service with GlusterStorage as a Service with Gluster
Storage as a Service with GlusterVijay Bellur
 
Gluster Storage
Gluster StorageGluster Storage
Gluster StorageRaz Tamir
 
Tiering barcelona
Tiering barcelonaTiering barcelona
Tiering barcelonaGluster.org
 
Getting started with postgresql
Getting started with postgresqlGetting started with postgresql
Getting started with postgresqlbotsplash.com
 
Sdc challenges-2012
Sdc challenges-2012Sdc challenges-2012
Sdc challenges-2012Gluster.org
 
Gluster overview & future directions vault 2015
Gluster overview & future directions vault 2015Gluster overview & future directions vault 2015
Gluster overview & future directions vault 2015Vijay Bellur
 
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous AvailabilityRamp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous AvailabilityPythian
 
Introduction to GlusterFS Webinar - September 2011
Introduction to GlusterFS Webinar - September 2011Introduction to GlusterFS Webinar - September 2011
Introduction to GlusterFS Webinar - September 2011GlusterFS
 
SQL, NoSQL, NewSQL? What's a developer to do?
SQL, NoSQL, NewSQL? What's a developer to do?SQL, NoSQL, NewSQL? What's a developer to do?
SQL, NoSQL, NewSQL? What's a developer to do?Chris Richardson
 

What's hot (20)

PGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan Pachenko
PGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan PachenkoPGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan Pachenko
PGConf.ASIA 2019 Bali - Keynote Speech 2 - Ivan Pachenko
 
Managing terabytes: When Postgres gets big
Managing terabytes: When Postgres gets bigManaging terabytes: When Postgres gets big
Managing terabytes: When Postgres gets big
 
Overview of some popular distributed databases
Overview of some popular distributed databasesOverview of some popular distributed databases
Overview of some popular distributed databases
 
Red Hat Gluster Storage - Direction, Roadmap and Use-Cases
Red Hat Gluster Storage - Direction, Roadmap and Use-CasesRed Hat Gluster Storage - Direction, Roadmap and Use-Cases
Red Hat Gluster Storage - Direction, Roadmap and Use-Cases
 
GlusterFS And Big Data
GlusterFS And Big DataGlusterFS And Big Data
GlusterFS And Big Data
 
Gluster.community.day.2013
Gluster.community.day.2013Gluster.community.day.2013
Gluster.community.day.2013
 
Gluster Data Tiering
Gluster Data TieringGluster Data Tiering
Gluster Data Tiering
 
Red Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFSRed Hat Storage - Introduction to GlusterFS
Red Hat Storage - Introduction to GlusterFS
 
The Future of GlusterFS and Gluster.org
The Future of GlusterFS and Gluster.orgThe Future of GlusterFS and Gluster.org
The Future of GlusterFS and Gluster.org
 
Storage as a Service with Gluster
Storage as a Service with GlusterStorage as a Service with Gluster
Storage as a Service with Gluster
 
Glusterfs and Hadoop
Glusterfs and HadoopGlusterfs and Hadoop
Glusterfs and Hadoop
 
Gluster Storage
Gluster StorageGluster Storage
Gluster Storage
 
Tiering barcelona
Tiering barcelonaTiering barcelona
Tiering barcelona
 
Getting started with postgresql
Getting started with postgresqlGetting started with postgresql
Getting started with postgresql
 
Sdc challenges-2012
Sdc challenges-2012Sdc challenges-2012
Sdc challenges-2012
 
Gluster overview & future directions vault 2015
Gluster overview & future directions vault 2015Gluster overview & future directions vault 2015
Gluster overview & future directions vault 2015
 
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous AvailabilityRamp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability
Ramp-Tutorial for MYSQL Cluster - Scaling with Continuous Availability
 
Introduction to GlusterFS Webinar - September 2011
Introduction to GlusterFS Webinar - September 2011Introduction to GlusterFS Webinar - September 2011
Introduction to GlusterFS Webinar - September 2011
 
SQL, NoSQL, NewSQL? What's a developer to do?
SQL, NoSQL, NewSQL? What's a developer to do?SQL, NoSQL, NewSQL? What's a developer to do?
SQL, NoSQL, NewSQL? What's a developer to do?
 
CockroachDB
CockroachDBCockroachDB
CockroachDB
 

Viewers also liked

Distributed Postgres
Distributed PostgresDistributed Postgres
Distributed PostgresStas Kelvich
 
Postgres in Amazon RDS
Postgres in Amazon RDSPostgres in Amazon RDS
Postgres in Amazon RDSDenish Patel
 
Pgxc scalability pg_open2012
Pgxc scalability pg_open2012Pgxc scalability pg_open2012
Pgxc scalability pg_open2012Ashutosh Bapat
 
Flexible Indexing with Postgres
Flexible Indexing with PostgresFlexible Indexing with Postgres
Flexible Indexing with PostgresEDB
 
How the Postgres Query Optimizer Works
How the Postgres Query Optimizer WorksHow the Postgres Query Optimizer Works
How the Postgres Query Optimizer WorksEDB
 
Overview of Postgres 9.5
Overview of Postgres 9.5 Overview of Postgres 9.5
Overview of Postgres 9.5 EDB
 
1 introduction
1 introduction1 introduction
1 introductionUtkarsh De
 
4 the sql_standard
4 the  sql_standard4 the  sql_standard
4 the sql_standardUtkarsh De
 
6 relational schema_design
6 relational schema_design6 relational schema_design
6 relational schema_designUtkarsh De
 
Best Practices for Database Schema Design
Best Practices for Database Schema DesignBest Practices for Database Schema Design
Best Practices for Database Schema DesignIron Speed
 
5 data storage_and_indexing
5 data storage_and_indexing5 data storage_and_indexing
5 data storage_and_indexingUtkarsh De
 
Managing your tech career
Managing your tech careerManaging your tech career
Managing your tech careerGreg Jensen
 
Webinar: Build an Application Series - Session 2 - Getting Started
Webinar: Build an Application Series - Session 2 - Getting StartedWebinar: Build an Application Series - Session 2 - Getting Started
Webinar: Build an Application Series - Session 2 - Getting StartedMongoDB
 
3 relational model
3 relational model3 relational model
3 relational modelUtkarsh De
 

Viewers also liked (20)

Distributed Postgres
Distributed PostgresDistributed Postgres
Distributed Postgres
 
Postgres in Amazon RDS
Postgres in Amazon RDSPostgres in Amazon RDS
Postgres in Amazon RDS
 
Scaling postgres
Scaling postgresScaling postgres
Scaling postgres
 
Pgxc scalability pg_open2012
Pgxc scalability pg_open2012Pgxc scalability pg_open2012
Pgxc scalability pg_open2012
 
Replicação PostgreSQL com RepManager
Replicação PostgreSQL com RepManagerReplicação PostgreSQL com RepManager
Replicação PostgreSQL com RepManager
 
Flexible Indexing with Postgres
Flexible Indexing with PostgresFlexible Indexing with Postgres
Flexible Indexing with Postgres
 
How the Postgres Query Optimizer Works
How the Postgres Query Optimizer WorksHow the Postgres Query Optimizer Works
How the Postgres Query Optimizer Works
 
1
11
1
 
An Introduction to Postgresql
An Introduction to PostgresqlAn Introduction to Postgresql
An Introduction to Postgresql
 
Overview of Postgres 9.5
Overview of Postgres 9.5 Overview of Postgres 9.5
Overview of Postgres 9.5
 
1 introduction
1 introduction1 introduction
1 introduction
 
4 the sql_standard
4 the  sql_standard4 the  sql_standard
4 the sql_standard
 
6 relational schema_design
6 relational schema_design6 relational schema_design
6 relational schema_design
 
Best Practices for Database Schema Design
Best Practices for Database Schema DesignBest Practices for Database Schema Design
Best Practices for Database Schema Design
 
5 data storage_and_indexing
5 data storage_and_indexing5 data storage_and_indexing
5 data storage_and_indexing
 
Normalization
NormalizationNormalization
Normalization
 
Managing your tech career
Managing your tech careerManaging your tech career
Managing your tech career
 
Multimaster
MultimasterMultimaster
Multimaster
 
Webinar: Build an Application Series - Session 2 - Getting Started
Webinar: Build an Application Series - Session 2 - Getting StartedWebinar: Build an Application Series - Session 2 - Getting Started
Webinar: Build an Application Series - Session 2 - Getting Started
 
3 relational model
3 relational model3 relational model
3 relational model
 

Similar to Postgres-XC Write Scalable PostgreSQL Cluster

JDG 7 & Spark Integration
JDG 7 & Spark IntegrationJDG 7 & Spark Integration
JDG 7 & Spark IntegrationTed Won
 
Clusters With Glusterfs
Clusters With GlusterfsClusters With Glusterfs
Clusters With GlusterfsOntico
 
Geospatial web services using little-known GDAL features and modern Perl midd...
Geospatial web services using little-known GDAL features and modern Perl midd...Geospatial web services using little-known GDAL features and modern Perl midd...
Geospatial web services using little-known GDAL features and modern Perl midd...Ari Jolma
 
Elephant Roads: PostgreSQL Patches and Variants
Elephant Roads: PostgreSQL Patches and VariantsElephant Roads: PostgreSQL Patches and Variants
Elephant Roads: PostgreSQL Patches and VariantsPostgreSQL Experts, Inc.
 
Elephant Roads: a tour of Postgres forks
Elephant Roads: a tour of Postgres forksElephant Roads: a tour of Postgres forks
Elephant Roads: a tour of Postgres forksCommand Prompt., Inc
 
19. Cloud Native Computing - Kubernetes - Bratislava - Databases in K8s world
19. Cloud Native Computing - Kubernetes - Bratislava - Databases in K8s world19. Cloud Native Computing - Kubernetes - Bratislava - Databases in K8s world
19. Cloud Native Computing - Kubernetes - Bratislava - Databases in K8s worldDávid Kőszeghy
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...javier ramirez
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022StreamNative
 
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive huguk
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadKrivoy Rog IT Community
 
MySQL X protocol - Talking to MySQL Directly over the Wire
MySQL X protocol - Talking to MySQL Directly over the WireMySQL X protocol - Talking to MySQL Directly over the Wire
MySQL X protocol - Talking to MySQL Directly over the WireSimon J Mudd
 
Linuxtag.ceph.talk
Linuxtag.ceph.talkLinuxtag.ceph.talk
Linuxtag.ceph.talkUdo Seidel
 
DevEx | there’s no place like k3s
DevEx | there’s no place like k3sDevEx | there’s no place like k3s
DevEx | there’s no place like k3sHaggai Philip Zagury
 
Blackray @ SAPO CodeBits 2009
Blackray @ SAPO CodeBits 2009Blackray @ SAPO CodeBits 2009
Blackray @ SAPO CodeBits 2009fschupp
 
High performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodbHigh performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodbWei Shan Ang
 
PUGS Meetup Presentation - 11062015
PUGS Meetup Presentation - 11062015PUGS Meetup Presentation - 11062015
PUGS Meetup Presentation - 11062015Wei Shan Ang
 

Similar to Postgres-XC Write Scalable PostgreSQL Cluster (20)

JDG 7 & Spark Integration
JDG 7 & Spark IntegrationJDG 7 & Spark Integration
JDG 7 & Spark Integration
 
Clusters With Glusterfs
Clusters With GlusterfsClusters With Glusterfs
Clusters With Glusterfs
 
Geospatial web services using little-known GDAL features and modern Perl midd...
Geospatial web services using little-known GDAL features and modern Perl midd...Geospatial web services using little-known GDAL features and modern Perl midd...
Geospatial web services using little-known GDAL features and modern Perl midd...
 
The Accidental DBA
The Accidental DBAThe Accidental DBA
The Accidental DBA
 
Elephant Roads: PostgreSQL Patches and Variants
Elephant Roads: PostgreSQL Patches and VariantsElephant Roads: PostgreSQL Patches and Variants
Elephant Roads: PostgreSQL Patches and Variants
 
Elephant Roads: a tour of Postgres forks
Elephant Roads: a tour of Postgres forksElephant Roads: a tour of Postgres forks
Elephant Roads: a tour of Postgres forks
 
NoSQL solutions
NoSQL solutionsNoSQL solutions
NoSQL solutions
 
19. Cloud Native Computing - Kubernetes - Bratislava - Databases in K8s world
19. Cloud Native Computing - Kubernetes - Bratislava - Databases in K8s world19. Cloud Native Computing - Kubernetes - Bratislava - Databases in K8s world
19. Cloud Native Computing - Kubernetes - Bratislava - Databases in K8s world
 
NoSQL Databases
NoSQL DatabasesNoSQL Databases
NoSQL Databases
 
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
Como creamos QuestDB Cloud, un SaaS basado en Kubernetes alrededor de QuestDB...
 
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
Towards a ZooKeeper-less Pulsar, etcd, etcd, etcd. - Pulsar Summit SF 2022
 
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive
 
kranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High loadkranonit S06E01 Игорь Цинько: High load
kranonit S06E01 Игорь Цинько: High load
 
MySQL X protocol - Talking to MySQL Directly over the Wire
MySQL X protocol - Talking to MySQL Directly over the WireMySQL X protocol - Talking to MySQL Directly over the Wire
MySQL X protocol - Talking to MySQL Directly over the Wire
 
Linuxtag.ceph.talk
Linuxtag.ceph.talkLinuxtag.ceph.talk
Linuxtag.ceph.talk
 
DevEx | there’s no place like k3s
DevEx | there’s no place like k3sDevEx | there’s no place like k3s
DevEx | there’s no place like k3s
 
Blackray @ SAPO CodeBits 2009
Blackray @ SAPO CodeBits 2009Blackray @ SAPO CodeBits 2009
Blackray @ SAPO CodeBits 2009
 
High performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodbHigh performance json- postgre sql vs. mongodb
High performance json- postgre sql vs. mongodb
 
Node.js scaling in highload
Node.js scaling in highloadNode.js scaling in highload
Node.js scaling in highload
 
PUGS Meetup Presentation - 11062015
PUGS Meetup Presentation - 11062015PUGS Meetup Presentation - 11062015
PUGS Meetup Presentation - 11062015
 

Recently uploaded

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 

Recently uploaded (20)

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 

Postgres-XC Write Scalable PostgreSQL Cluster

  • 1. Postgres-XC: Write-Scalable PostgreSQL Cluster Mason Sharp August 7th, 2012 CC License: Attribution-NonCommercial-ShareAlike
  • 2. Content Attribution • Koichi Suzuki • Michael Paquier • Ashutosh Bapat • Pavan Deolasee • Mason Sharp • ...? Aug 7, 2012 2
  • 3. Who am I ● Mason Sharp ● Co-organizer of NYC PUG ● Co-founder of StormDB ● Previously worked at EnterpriseDB ● Original architect of Stado (GridSQL) ● One of the original architects of Postgres-XC Aug 7, 2012 Postgres-XC 3
  • 4. PostgreSQL User Groups San Francisco New York 616 Members 502 Members New: Philadelphia Los Angeles Tokyo 2000? Members Aug 7, 2012 Postgres-XC 4
  • 5. NYC PUG Meetup Membership Aug 7, 2012 Postgres-XC 5
  • 6. NYC PUG Speakers ● Recent speakers include ● Bruce Momjian ● Greg Smith ● Greg Stark ● Joe Conway ● Joachim Wieland Aug 7, 2012 Postgres-XC 6
  • 7. NYC PUG Speakers We want you! Aug 7, 2012 Postgres-XC 7
  • 8. Postges-XC Talk ● Background ● Postgres-XC Introduction & Usage ● Postgres-XC Components ● Postgres-XC Details 8
  • 9. Background Aug 7, 2012 Postgres-XC 9
  • 10. Data Tier Scaling ● Up versus Out ● More memory, more cores ● Read-only Replicated Slaves ● Caching ● Memcached ● Sharding ● NoSQL ● NewSQL Aug 7, 2012 Postgres-XC 10
  • 11. XC Origins Koichi Suzuki, NTT Data Mason Sharp Aug 7, 2012 Postgres-XC 11
  • 12. PostgreSQL-Related Clustering Projects ● pgpool-II ● Read replicated slaves ● PL/Proxy ● Used by Skype, meetme (myYearbook) ● All access is over a stored function ● Postgres-R, PostgresForest ● Stado (GridSQL) ● Parallel Query Can we make it write scalable? ● Not write-scalable Aug 7, 2012 Postgres-XC 12
  • 13. Postgres-XC Introduction Aug 7, 2012 Postgres-XC 13
  • 14. Overview ● PostgreSQL-based database cluster ● Same API to Apps as PostgreSQL – Same drivers ● Currently based upon PG 9.1. Soon: 9.2. ● Symmetric Multi-headed Cluster ● No master, no slave – Not just PostgreSQL replication. – Application can read/write to any coordinator server ● Consistent database view to all the transactions – Complete ACID property to all the transactions in the cluster ● Scales both for Write and Read Aug 7, 2012 Postgres-XC 14
  • 15. Postgres-XC Cluster Application can connect to any server to have the same database view and service . PG- XC Server PG- XC Server PG- XC Server PG- XC Server Coordinator Coordinator Coordinator ・・・ ・・ Coordinator Data Node Data Node Data Node Add PG- XC servers as Data Node needed Communication among PG- XC servers Global Transaction Manager GTM Aug 7, 2012 Postgres-XC 15
  • 16. Read/Write Scalability DBT-1 throughput scalability Aug 7, 2012 Postgres-XC 16
  • 17. I Consistency Aug 7, 2012 Postgres-XC 17
  • 18. Is XC right for you? ● I need write scalability ● I like ACID ● I like SQL ● I don't want to rewrite my existing SQL applications ● I want to leverage the PostgreSQL community for all of their contrib modules Aug 7, 2012 Postgres-XC 18
  • 19. Why XC may not be right for you ● I need MPP parallel query capability ● Parallel Query in XC Limited ● Try Stado: www.stado.us ● I need a solution with built-in HA ● I need massive scale and have loose consistency requirements ● I would rather use a NoSQL solution so I can put it on my resume Aug 7, 2012 Postgres-XC 19
  • 20. Postgres-XC Components Aug 7, 2012 Postgres-XC 20
  • 21. Aug 7, 2012 Postgres-XC 21
  • 22. Coordinator Overview ● Based on PostgreSQL 9.1 (9.2 soon) ● Accepts connections from clients ● Parses and plans requests ● Interacts with Global Transaction Manager ● Uses pooler for Data Node connections ● Sends down XIDs and snapshots to Data Nodes ● Collects results and returns to client ● Uses two phase commit if necessary 22
  • 23. Data Node Overview ● Based on PostgreSQL 9.1 (9.2 soon) ● Where user created data is actually stored ● Coordinators (not clients) connects to Data Nodes ● Accepts XID and snapshots from Coordinator ● The rest is fairly similar to vanilla PostgreSQL 23
  • 24. Global Transaction Manager GTM Cluster nodes XID Snapshot Timestamp Sequence values Aug 7, 2012 Postgres-XC 24
  • 25. Summary ● Coordinator ● Visible to apps Postgres-XC core, based upon vanilla PostgreSQL ● SQL analysis, planning, execution ● Connection pooling Share same binary ● Datanode (or simply “NODE”) May want to colocate ● Actual database store ● Local SQL execution ● GTM (Global Transaction Manager) ● Provides consistent database view to transactions – GXID (Global Transaction ID) – Snapshot (List of active transactions) Different binaries – Other global values such as SEQUENCE ● GTM Proxy, integrates server-local transaction requirement for performance Aug 7, 2012 Postgres-XC 25
  • 26. Data Distribution Distribution Strategies Aug 7, 2012 Postgres-XC 26
  • 27. Distributing the data ● Replicated table ● Each row in the table is replicated to the datanodes ● Statement based replication ● Distributed table ● Each row of the table is stored on one datanode, decided by one of following strategies – Hash – Round Robin – Modulo – Range and user defined function (future) Aug 7, 2012 Postgres-XC 27
  • 28. Table Distribution and Replication ● Each table can be distributed or replicated ● Strategy based on usage – Transaction tables → Distributed – Static lookup tables → Replicate – Distribute parent-children together ● Join pushdown when possible ● Where clause pushdown ● Simple parallel aggregates Aug 7, 2012 Postgres-XC 28
  • 29. Defining Tables ● Table Distribution/Replication ● CREATE TABLE tab (…) DISTRIBUTE BY HASH(col) | MODULO(col) | ROUND ROBIN | REPLICATION Aug 7, 2012 Postgres-XC 29
  • 30. Replicated Tables Reads Writes read write write write val val2 val val2 val val2 val val2 val val2 val val2 1 2 1 2 1 2 1 2 1 2 1 2 2 10 2 10 2 10 2 10 2 10 2 10 3 4 3 4 3 4 3 4 3 4 3 4 Aug 7, 2012 Postgres-XC 30
  • 31. Distributed Tables Write Read Combiner write read read read val val2 val val2 val val2 val val2 val val2 val val2 1 2 11 21 10 20 1 2 11 21 10 20 2 10 21 101 20 100 2 10 20 100 21 101 3 4 31 41 30 40 3 4 31 41 30 40 Aug 7, 2012 Postgres-XC 31
  • 32. Join Pushdown Hash/Module Round Robin Replicated distributed Hash/Modulo Inner join with NO Inner join if replicated distributed equality condition on table's distribution list the distribution is superset of column with same distributed table's data type and same distribution list distribution strategy Round Robin No No Inner join if replicated table's distribution list is superset of distributed table's distribution list Replicated Inner join if replicated Inner join if replicated All kinds of joins table's distribution list table's distribution list is superset of is superset of distributed table's distributed table's distribution list distribution list Aug 7, 2012 Postgres-XC 32
  • 33. Constraints ● XC does not support Global constraints – i.e. constraints across datanodes ● Constraints within a datanode are supported Distribution strategy Unique, primary key Foreign key constraints constraints Replicated Supported Supported if the referenced table is also replicated on the same nodes Hash/Modulo distributed Supported if primary OR Supported if the referenced unique key is distribution key table is replicated on same nodes OR it's distributed by primary key in the same manner and same nodes Round Robin Not supported Supported if the referenced table is replicated on same nodes Aug 7, 2012 Postgres-XC 33
  • 34. Demo Aug 7, 2012 Postgres-XC 34
  • 35. Transaction Management Why MVCC is Important for Consistency Global Transaction Manger Aug 7, 2012 Postgres-XC 35
  • 36. Multi-version Concurrency Control (MVCC) (quick overview) ● Readers do not block writers ● Writers do not block readers ● Transaction Ids (XIDs) ● Every transaction gets an ID ● Snapshots contain a list of running XIDs Aug 7, 2012 Postgres-XC 36
  • 37. Multi-version Concurrency Control (MVCC) (quickly discussed) Example: T1 Begin... T2 Begin; INSERT...; Commit T3 Begin... T4 Begin; SELECT ● T4's snapshot contains T1 and T3 ● T2 already committed ● It can see T2's commits, but not T1's nor T3's Aug 7, 2012 Postgres-XC 37
  • 38. Multi-version Concurrency Control (MVCC) on 2 Independent Nodes Example: T1 Begin... T2 Begin; INSERT..; Commit; T3 Begin... T4 Begin; SELECT ● Node 1: T2 Commit, T4 SELECT ● Node 2: T4 SELECT, T2 Commit ● T4's SELECT statement returns inconsistent data ● Includes data from Node1, but not Node2. ● C in ACID Fails Aug 7, 2012 Postgres-XC 38
  • 39. Global Transaction Manager (GTM) ● Provides Global Transaction Consistency GTM Cluster nodes XID Snapshot Timestamp Sequence values Aug 7, 2012 Postgres-XC 39
  • 40. Transaction Management ● 2PC is used to guarantee transactional consistency across nodes ● When there are more than one nodes involved OR ● When there are explicit 2PC transactions ● Only those nodes where write activity has happened, participate in 2PC ● In PostgreSQL 2PC can not be applied if temporary tables are involved. Same restriction applies in Postgres-XC ● When single coordinator command needs multiple datanode commands, we encase those in transaction block Aug 7, 2012 Postgres-XC 40
  • 41. Postgres-XC Considerations Aug 7, 2012 Postgres-XC 41
  • 42. Can GTM be a Performance Bottleneck? • Depending on implementation – Current Implementation Coordinators GTM GTM Threads Coordinator Backend Snapshot Data Domain Socket Applicable up to Client Library Coordinator Internet Lock five PG-XC Call servers (DBT-1) Create Terminate GTM Main Thread – Large snapshot size and number – Too many interaction between GTM and Coordinators July 12th, 2012 42
  • 43. Can GTM be a Performance Bottleneck? Proxy Implementation Coordinators GTM GTM Worker Threads GTM Proxy Thread Coordinator Backend Snapshot Data GTM Snapshot Handler GTM Server Scanner Server Protocol Handler Command Backend Handler Client Library Internet Coordinator Domain Socket Domain Socket Call Unix Lock Call Response Backend Handler Create Terminate Create Connection Terminate Assignment GTM Main Thread Proxy Main Thread Connection •Request/Response grouping •Single representative snapshot applied to multiple transactions July 12th, 2012 43
  • 44. Can GTM be a SPOF? • Implement GTM Standby Checkpoint next starting point (GXID and Sequence) GTM Master GTM Standby Standby can failover the master without referring to GTM master information. July 12th, 2012 44
  • 45. Parallel Query ● OK for simple queries ● Also when all joins can be pushed down – Star schema with replicated dimensions ● Even aggregates ● SELECT SUM(col1) FROM tab1 ● If cross-node join needed performs poorly ● Data on one node needs to join with another ● Ships all data to coordinator for joining Aug 7, 2012 Postgres-XC 45
  • 46. High Availability ● GTM-standby provides basic HA ● No native HA for nodes ● Use HA middleware such as Pacemaker ● Each data node should be configured with synchronous replication Aug 7, 2012 Postgres-XC 46
  • 47. Status Settings and options Aug 7, 2012 Postgres-XC 47
  • 48. Present Status ● Project/Developer site ● http://postgres-xc.sourceforge.net/ ● http://sourceforge.net/projects/postgres-xc/ ● Version 1.0 available ● Base PostgreSQL version: 9.1 ● Soon, PostgreSQL 9.2! – Group commit: even more write scalability – “Index-only Scans” ● Get Involved ● Even as just a tester Aug 7, 2012 Postgres-XC 48
  • 49. Easy way of trying it out? ● www.stormdb.com ● Not Postgres-XC, but similar ● Nothing to install, cloud hosted ● Free beta Aug 7, 2012 Postgres-XC 49
  • 50. Thank You mason@stormdb.com Twitter: mason_db Aug 7, 2012 Postgres-XC 50