SlideShare a Scribd company logo
1 of 43
1© 2015 Pivotal Software, Inc. All rights reserved.
2© 2015 Pivotal Software, Inc. All rights reserved.
Greenplum
Database
Open Source
December, 2015
3© 2015 Pivotal Software, Inc. All rights reserved.
Forward Looking Statements
This presentation contains “forward-looking statements” as defined under the Federal Securities Laws. Actual results could differ materially
from those projected in the forward-looking statements as a result of certain risk factors, including but not limited to: (i) adverse changes in
general economic or market conditions; (ii) delays or reductions in information technology spending; (iii) the relative and varying rates of
product price and component cost declines and the volume and mixture of product and services revenues; (iv) competitive factors,
including but not limited to pricing pressures and new product introductions; (v) component and product quality and availability; (vi)
fluctuations in VMware’s Inc.’s operating results and risks associated with trading of VMware stock; (vii) the transition to new products, the
uncertainty of customer acceptance of new product offerings and rapid technological and market change; (viii) risks associated with
managing the growth of our business, including risks associated with acquisitions and investments and the challenges and costs of
integration, restructuring and achieving anticipated synergies; (ix) the ability to attract and retain highly qualified employees; (x) insufficient,
excess or obsolete inventory; (xi) fluctuating currency exchange rates; (xii) threats and other disruptions to our secure data centers and
networks; (xiii) our ability to protect our proprietary technology; (xiv) war or acts of terrorism; and (xv) other one-time events and other
important factors disclosed previously and from time to time in the filings EMC Corporation, the parent company of Pivotal, with the U.S.
Securities and Exchange Commission. EMC and Pivotal disclaim any obligation to update any such forward-looking statements after the
date of this release.
4© 2015 Pivotal Software, Inc. All rights reserved.
Safe Harbor
“Any information regarding pre-release of Pivotal offerings,
future updates or other planned modifications is subject to
ongoing evaluation by Pivotal and therefore subject to
change. This information is provided without warranty of any
kind, express or implied. Customers who purchase Pivotal
offerings should make their purchase decision based upon
features that are currently available. Pivotal has no
obligation to update forward looking information in this
presentation.”
5© 2015 Pivotal Software, Inc. All rights reserved.
 Relational database system for big data
 Mission critical & system of record product with supporting tools and ecosystem
 Fully open source with a global community of developers and users
 Implement world’s leading research in database technology across all components
– Optimizer, Query Execution
– Transaction Processing, Database Storage, Compression, High Availability
– Embedded Programming Languages (Python, R, Java, etc …. )
– In-Database analytics in domains (e.g. Geospatial, Text, Machine Learning, Mathematics, etc …. )
 Performance tuned for multiple workload profiles
– Analytics, long running queries, short running queries, mixed workloads
 Large industrial focused system
– Financial, Government, Telecom, Retail, Manufacturing, Oil & Gas, etc…….
Greenplum Database Mission & Strategy
6© 2015 Pivotal Software, Inc. All rights reserved.
 An ambitious project
– 10 years in the making
– Investment of hundred of millions of dollars
– Potential to define a new market and disrupt traditional EDW vendors
 www.greenplum.org
– Github code
– mailing lists / community engagement
– Global project w/ external contributors
 Pivotal Greenplum
– Enterprise software distribution & release management
– Pivotal expertise
– 24-hour global support
Greenplum Open Source
7© 2015 Pivotal Software, Inc. All rights reserved.
PostgreSQL Compatibility
Roadmap
• Strategic backport key features from PostgreSQL to Greenplum … JSONB, UUID,
Variadic functions, Default function arguments, etc.
• Consistent back porting of patches from older PostgreSQL to Greenplum …
8© 2015 Pivotal Software, Inc. All rights reserved.
MPP Shared Nothing Architecture
Standby
Master
Segment Host with one or more Segment Instances
Segment Instances process queries in parallel
Performance Through Segment Instance Parallelism
High speed interconnect for
continuous pipelining of data
processing
…
Master
Host
SQL
Master Host and Standby Master Host
Master coordinates work with Segment Hosts
Interconnect
Segment Host
Segment Instance
Segment Instance
Segment Instance
Segment Instance
Segment Hosts have their own
CPU, disk and memory (shared
nothing) Segment Host
Segment Instance
Segment Instance
Segment Instance
Segment Instance
node1
Segment Host
Segment Instance
Segment Instance
Segment Instance
Segment Instance
node2
Segment Host
Segment Instance
Segment Instance
Segment Instance
Segment Instance
node3
Segment Host
Segment Instance
Segment Instance
Segment Instance
Segment Instance
nodeN
9© 2015 Pivotal Software, Inc. All rights reserved.
Master Host
Master Segment
Catalog
Query Optimizer
Distributed TM
DispatchQuery Executor
Parser enforces
syntax, semantics
and produces a
parse tree
Client
Accepts client connections,
incoming user requests and
performs authentication
Parser
Master Host
10© 2015 Pivotal Software, Inc. All rights reserved.
Pivotal Query Optimizer
Local Storage
Master Segment
CatalogDistributed TM
Interconnect
DispatcherQuery Executor
Parser Query Optimizer
Consumes the
parse tree and
produces the query
plan
Query execution
plan contains how
the query is
executed
Master Host
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Host
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Host
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Host
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
11© 2015 Pivotal Software, Inc. All rights reserved.
Query Dispatcher
Local Storage
Master Segment
CatalogDistributed TM
Interconnect
Query Optimizer
Query Executor
Parser
Dispatcher
Responsible for
communicating the
query plan to
segments
Allocates cluster
resources required to
perform the job and
accumulating/presenti
ng final results
Master Host
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Host
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Host
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Host
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
12© 2015 Pivotal Software, Inc. All rights reserved.
Query Executor
Local Storage
Master Segment
CatalogDistributed TM
Interconnect
Query Optimizer
Query Dispatcher
Parser
Query Executor
Responsible for
executing the steps
in the plan
(e.g. open file,
iterate over tuples)
Communicates its
intermediate results
to other executor
processes
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Host
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Host
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Host
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Master Host
13© 2015 Pivotal Software, Inc. All rights reserved.
Distributed Transaction Management
Local Storage
Master Segment
Query Executor
Catalog
Interconnect
Query Optimizer
Query Dispatcher
Parser
Distributed TM
Segments have their
own commit and replay
logs and decide when
to commit, abort for
their own transactions
DTM resides on the
master and
coordinates the
commit and abort
actions of segments
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Host
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Host
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Host
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Segment Instance
Local TM
Query Executor
Catalog
Local Storage
Master Host
14© 2015 Pivotal Software, Inc. All rights reserved.
GPDB High Availability
 Master Host mirroring
– Warm Standby Master Host
▪ Replica of Master Host system catalogs
– Eliminates single point of failure
– Synchronization process between Master Host and Standby Master Host
▪ Uses PostgreSQL WAL Replication
 Segment mirroring
– Creates a mirror segment for every primary segment
▪ Uses a custom file block replication process
– If a primary segment becomes unavailable automatic failover to the mirror
15© 2015 Pivotal Software, Inc. All rights reserved.
Define the Storage Model
CREATE TABLE
 Heap Tables versus Append Optimized (AO) Tables
 Row oriented storage versus Column oriented storage
 Compression
– Table level compression applied to entire table
– Column level compression applied to a specific column w/ columnar storage
– Zlib level with Run Length Encoding Optional
16© 2015 Pivotal Software, Inc. All rights reserved.
Polymorphic Storage™
User Definable Storage Layout
 Columnar storage compresses better
 Optimized for retrieving a subset of the
columns when querying
 Compression can be set differently per
column: gzip (1-9), quicklz, delta, RLE
 Row oriented faster when returning
all columns
 HEAP for many updates and deletes
 Use indexes for drill through queries
TABLE ‘SALES’
Jun
Column-orientedRow-oriented
Oct Year -
1
Year -
2
External HDFS
 Less accessed partitions
on HDFS with external
partitions to seamlessly
query all data
 Text, CSV, Binary, Avro,
Parquet format
 All major HDP Distros
Nov DecJul Aug Sep
17© 2015 Pivotal Software, Inc. All rights reserved.
CREATE TABLE Define Data Distributions
 One of the most important aspects of GPDB!
 Every table has a distribution method
 DISTRIBUTED BY (column)
– Uses a hash distribution
 DISTRIBUTED RANDOMLY
– Uses a random distribution which is not guaranteed to provide a perfectly even
distribution
 Explicitly define a column or random distribution for all tables
– Do not use the default
18© 2015 Pivotal Software, Inc. All rights reserved.
Data Distribution: The Key to Parallelism
The primary strategy and goal is to spread data evenly across
all segment instances. Most important in a MPP shared nothing
architecture!
43 Oct 20 2005 12
64 Oct 20 2005 111
45 Oct 20 2005 42
46 Oct 20 2005 64
77 Oct 20 2005 32
48 Oct 20 2005 12
Order
Order#
Order
Date
Customer
ID
50 Oct 20 2005 34
56 Oct 20 2005 213
63 Oct 20 2005 15
44 Oct 20 2005 102
53 Oct 20 2005 82
55 Oct 20 2005 55
19© 2015 Pivotal Software, Inc. All rights reserved.
CREATE TABLE Define Partitioning
 Reduces the amount of data to be scanned by reading only the relevant data
needed to satisfy a query
– The only goal of partitioning is to achieve partition elimination aka partition
pruning
 Is not a substitution for distributions
– A good distribution strategy and partitioning that achieves partition
elimination unlocks performance magic
 Uses table inheritance and constraints
– Persistent relationship between parent and child tables
20© 2015 Pivotal Software, Inc. All rights reserved.
Segment 1A Segment 1B Segment 1C Segment 1D
Segment 2A Segment 2B Segment 2C Segment 2D
Segment 3A Segment 3B Segment 3C Segment 3D
Segment 1A Segment 1B Segment 1C Segment 1D
Segment 2A Segment 2B Segment 2C Segment 2D
Segment 3A Segment 3B Segment 3C Segment 3D
Segment 1A Segment 1B Segment 1C Segment 1D
Segment 2A Segment 2B Segment 2C Segment 2D
Segment 3A Segment 3B Segment 3C Segment 3D
Distributions and Partitioning
SELECT COUNT(*)
FROM orders
WHERE order_date >= ‘Oct 20 2007’
AND order_date < ‘Oct 27 2007’
&
Evenly distribute orders data across all segments Only scans the relevant order partitions
Segment 1A Segment 1B Segment 1C Segment 1D
Segment 2A Segment 2B Segment 2C Segment 2D
Segment 3A Segment 3B Segment 3C Segment 3D
21© 2015 Pivotal Software, Inc. All rights reserved.
Polymorphic Storage™
User Definable Storage Layout
 Columnar storage compresses better
 Optimized for retrieving a subset of the
columns when querying
 Compression can be set differently per
column: gzip (1-9), quicklz, delta, RLE
 Row oriented faster when returning
all columns
 HEAP for many updates and deletes
 Use indexes for drill through queries
TABLE ‘SALES’
Jun
Column-orientedRow-oriented
Oct Year -
1
Year -
2
External HDFS
 Less accessed partitions
on HDFS with external
partitions to seamlessly
query all data
 Text, CSV, Binary, Avro,
Parquet format
 All major HDP Distros
Nov DecJul Aug Sep
22© 2015 Pivotal Software, Inc. All rights reserved.
Analytics • Bringing the power of parallelism to modeling and analytics
Path Functions
• Identify rows of interest from raw table or view
• Pattern match across rows using regex
• Define one or more windows on the matches
• Apply standard PostgreSQL window functions
or aggregations on the windows
Future Roadmap
Support Vector Machines
GP Text
• Time Series, Gap Filling
• Complex Number Support
23© 2015 Pivotal Software, Inc. All rights reserved.
• Government detection of benefits that should not be made
• Government detection of tax fraud
• Government economic statistics research database
• Commercial banking wealth management data science and product development
• Financial corporation's risk and trade repositories reporting
• Pharmaceutical company vaccine potency prediction based on manufacturing sensors
• 401K providers analytics on investment choices
• Auto manufacturer’s analytics on predictive maintenance
• Corporate/Financial internal email and communication surveillance and reporting
• Oil drilling equipment predictive maintenance
• Mobile telephone company enterprise data warehouse
• Retail store chain customer purchases analytics
• Airlines loyalty program analytics
• Telecom company network performance and availability analytics
• Corporate network anomalous behavior and intrusion detections
• Semiconductor Fab sensor analytics and reporting
Highlighted Greenplum successes
24© 2015 Pivotal Software, Inc. All rights reserved.
Recent Accomplishments
4.3.5.0 April 2015
GA of Pivotal Query Optimizer
With Parallel & Incremental Analyze
External Partitions GP-Workload ManagerPivotal Query Optimizer
GPDB 4.3.5 May 2015 GPDB 4.3.6 Sept 2015 GPCC 2.0 Dec 2015
25© 2015 Pivotal Software, Inc. All rights reserved.
Recent Accomplishments
4.3.5.0 April 2015
GA of Pivotal Query Optimizer
With Parallel & Incremental Analyze
Greenplum Open
Source
EMC DCA V3
Topic Modelling & Matrix
Operations
Madlib 1.8 July 2015 October 2015 DCA V3 Dec 2015
26© 2015 Pivotal Software, Inc. All rights reserved.
Pivotal Greenplum Roadmap Highlights
● S3 External Tables
● Performance tuned for AWS
● Dynamic Code Generation using LLVM
● Short running query performance
enhancements
● Faster analyze
● WAL Replication Segment Mirroring
● Incremental restore MVP
● Disk space full warnings
● Snapshot Backup
● Anaconda Python Modules: NLTK, etc
● Time Series Gap Filling
● Complex Numbers
● PostGIS Raster Support
● Geospatial Trajectories
● Path analytics
● Enhanced SVM module
● Py-Madlib
● Lock Free Backup
Greenplum File System Primer
Yon Lew
zData Inc
Directory Structure
• One directory per database per segment
• <base_dir>/<seg_dir>/base/<database oid>
e.g. /d/d2/primary/gpseg_37/base/19002
• SELECT oid, datname FROM pg_database;
Data Files
• Each file is named using the pg_classs.relfilenode
column of its relation
SELECT relfilenode FROM pg_class WHERE oid =
‘test.mytable’::regclass;
• Originally relfilenode is equal to the OID of the relation
but numerous database operations (e.g. truncate) can
change this value
Diagnostics
• CREATE EXTERNAL WEB TABLE database_files (
host TEXT
, segment INT
, file TEXT
, mtime TIMESTAMP
, sz BIGINT
) EXECUTE E’ls –l –time-style=+%Y%m%d_%H:%M:%S
$GP_SEG_DATADIR/base/<database_oid> | awk {’print
ENVIRON[“HOSTNAME”]”|”ENVIRON[“GP_SEGMENT_ID”]”|”$7”|”$6”|”$5’
}’
ON ALL
FORMAT ‘text’ (DELIMITER E’|’ NULL ‘’);
Diagnostics
• Querying this table can produce substantial
load since it stats every file in the cluster
• Views can easily be built on top of table to join
back to pg_class
Heap Tables
• One data file per heap table for tuple storage
• Minimum file size is equal to the default blocksize defined for the database
CREATE TABLE test1 (a INT, b VARCHAR, c DATE);
INSERT INTO test1 VALUES(1, ‘a’, current_date);
SELECT segment_id, sz FROM database_file WHERE file like ‘<relfilenode>%’;
segment | sz
0 | 0
1 | 0
2 | 32768
3 | 0
AO Tables
• Either row or columnar orientation
• Variable file size
• Columnar tables have one file per column (files with format
<relfilenode>.*)
• Concurrent loads also create a set of new files related to each table
• AO tables initially consist of a single empty file in each data
directory until data is inserted
• Data files are not limited to a minimum size corresponding to the
database blocksize.
AO Tables
CREATE TABLE test1 (a INT, b VARCHAR, c DATE) WITH (appendonly=true, orientation = row);
SELECT segment_id, file, sz FROM database_file WHERE file like ‘<relfilenode>%’;
segment | file | sz
0 | 3000010 | 0
1 | 3000010 | 0
2 | 3000010 | 0
3 | 3000010 | 0
INSERT INTO test1 VALUES(1, ‘a’, current_date);
SELECT segment_id, sz FROM database_file WHERE file like ‘<relfilenode>%’;
segment | file | sz
0 | 3000010 | 0
1 | 3000010 | 0
2 | 3000010 | 0
2 | 3000010.1 | 40
3 | 3000010 | 0
AO Tables
CREATE TABLE test1 (a INT, b VARCHAR, c DATE) WITH (appendonly=true, orientation = column);
SELECT segment_id, file, sz FROM database_file WHERE file like ‘<relfilenode>%’;
segment | file | sz
0 | 3000010 | 0
1 | 3000010 | 0
2 | 3000010 | 0
3 | 3000010 | 0
INSERT INTO test1 VALUES(1, ‘a’, current_date);
SELECT segment_id, sz FROM database_file WHERE file like ‘<relfilenode>%’;
segment | file | sz
0 | 3000010 | 0
1 | 3000010 | 0
2 | 3000010 | 0
2 | 3000010.1 | 40
2 | 3000010.129 | 40
2 | 3000010.257 | 40
3 | 3000010 | 0
AO Tables
• For large fact tables ADD/DROP COLUMN
operations are much faster carried out against
AO columnar tables as no rewrite of data files
is required.
AO Tables
• Beware of large numbers of concurrent loads running
against AO tables
• For example, 50 concurrent loads running against an
AO columnar table with 500 columns will produce
20000 primary segment files on a single segment host
(500 column files x 50 loads x 8 primary segments)
• File system efficiency can decline drastically as the
number of files increases
AO Tables
• Workarounds:
1. Rebuild the partition via batch processing
every night (CTAS followed by a partition
swap)
2. Load into a heap organized staging table
Skew
• Typically skew is discovered due to unbalanced
storage in one or more segments in the cluster
• Skew in the gp_toolkit view is calculated by
querying the hidden gp_segment_id column
SELECT gp_segment_id, count(*) FROM mytable
GROUP BY 1;
• This operation is prohibitively expensive when
querying all tables in a cluster
Skew
• Querying file metadata with the diagnostic table is much faster
• Coefficient of variation, interquartile range
SELECT
substring(file, ‘([0-9]+)’),
, stddev(sz)/avg(sz)
FROM database_files
GROUP BY 1
HAVING SUM(sz) != 0;
Bloat
• Checking for skew via the gp_segment_id
column will miss physical skew due to bloat
(dead space from deleted/updated tuples).
Join the community!
• Website
• Mailing lists
• Github
• Events
• More ….
43© 2015 Pivotal Software, Inc. All rights reserved.

More Related Content

What's hot

Greenplum Database on HDFS
Greenplum Database on HDFSGreenplum Database on HDFS
Greenplum Database on HDFSDataWorks Summit
 
White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applian...
 White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applian... White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applian...
White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applian...EMC
 
Expert Guide to Migrating Legacy Databases to Postgres
Expert Guide to Migrating Legacy Databases to PostgresExpert Guide to Migrating Legacy Databases to Postgres
Expert Guide to Migrating Legacy Databases to PostgresEDB
 
Postgres in Production - Best Practices 2014
Postgres in Production - Best Practices 2014Postgres in Production - Best Practices 2014
Postgres in Production - Best Practices 2014EDB
 
Overcoming write availability challenges of PostgreSQL
Overcoming write availability challenges of PostgreSQLOvercoming write availability challenges of PostgreSQL
Overcoming write availability challenges of PostgreSQLEDB
 
Beginner's Guide to High Availability for Postgres
Beginner's Guide to High Availability for PostgresBeginner's Guide to High Availability for Postgres
Beginner's Guide to High Availability for PostgresEDB
 
An overview of reference architectures for Postgres
An overview of reference architectures for PostgresAn overview of reference architectures for Postgres
An overview of reference architectures for PostgresEDB
 
Making your PostgreSQL Database Highly Available
Making your PostgreSQL Database Highly AvailableMaking your PostgreSQL Database Highly Available
Making your PostgreSQL Database Highly AvailableEDB
 
New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13EDB
 
Public Sector Virtual Town Hall: High Availability for PostgreSQL
Public Sector Virtual Town Hall: High Availability for PostgreSQLPublic Sector Virtual Town Hall: High Availability for PostgreSQL
Public Sector Virtual Town Hall: High Availability for PostgreSQLEDB
 
PostgreSQL as a Strategic Tool
PostgreSQL as a Strategic ToolPostgreSQL as a Strategic Tool
PostgreSQL as a Strategic ToolEDB
 
Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez DataWorks Summit
 
Beginner's Guide to High Availability for Postgres - French
Beginner's Guide to High Availability for Postgres - FrenchBeginner's Guide to High Availability for Postgres - French
Beginner's Guide to High Availability for Postgres - FrenchEDB
 
Un guide complet pour la migration de bases de données héritées vers PostgreSQL
Un guide complet pour la migration de bases de données héritées vers PostgreSQLUn guide complet pour la migration de bases de données héritées vers PostgreSQL
Un guide complet pour la migration de bases de données héritées vers PostgreSQLEDB
 
Why Care Risk Choose PostgreSQL
Why Care Risk Choose PostgreSQLWhy Care Risk Choose PostgreSQL
Why Care Risk Choose PostgreSQLEDB
 
Database Dumps and Backups
Database Dumps and BackupsDatabase Dumps and Backups
Database Dumps and BackupsEDB
 
9/ IBM POWER @ OPEN'16
9/ IBM POWER @ OPEN'169/ IBM POWER @ OPEN'16
9/ IBM POWER @ OPEN'16Kangaroot
 

What's hot (20)

Greenplum Database on HDFS
Greenplum Database on HDFSGreenplum Database on HDFS
Greenplum Database on HDFS
 
Greenplum Roadmap
Greenplum RoadmapGreenplum Roadmap
Greenplum Roadmap
 
White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applian...
 White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applian... White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applian...
White Paper: Backup and Recovery of the EMC Greenplum Data Computing Applian...
 
Expert Guide to Migrating Legacy Databases to Postgres
Expert Guide to Migrating Legacy Databases to PostgresExpert Guide to Migrating Legacy Databases to Postgres
Expert Guide to Migrating Legacy Databases to Postgres
 
Postgres in Production - Best Practices 2014
Postgres in Production - Best Practices 2014Postgres in Production - Best Practices 2014
Postgres in Production - Best Practices 2014
 
Greenplum hadoop
Greenplum hadoopGreenplum hadoop
Greenplum hadoop
 
Overcoming write availability challenges of PostgreSQL
Overcoming write availability challenges of PostgreSQLOvercoming write availability challenges of PostgreSQL
Overcoming write availability challenges of PostgreSQL
 
Beginner's Guide to High Availability for Postgres
Beginner's Guide to High Availability for PostgresBeginner's Guide to High Availability for Postgres
Beginner's Guide to High Availability for Postgres
 
An overview of reference architectures for Postgres
An overview of reference architectures for PostgresAn overview of reference architectures for Postgres
An overview of reference architectures for Postgres
 
Making your PostgreSQL Database Highly Available
Making your PostgreSQL Database Highly AvailableMaking your PostgreSQL Database Highly Available
Making your PostgreSQL Database Highly Available
 
New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13New enhancements for security and usability in EDB 13
New enhancements for security and usability in EDB 13
 
Public Sector Virtual Town Hall: High Availability for PostgreSQL
Public Sector Virtual Town Hall: High Availability for PostgreSQLPublic Sector Virtual Town Hall: High Availability for PostgreSQL
Public Sector Virtual Town Hall: High Availability for PostgreSQL
 
PostgreSQL as a Strategic Tool
PostgreSQL as a Strategic ToolPostgreSQL as a Strategic Tool
PostgreSQL as a Strategic Tool
 
Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez Graphene – Microsoft SCOPE on Tez
Graphene – Microsoft SCOPE on Tez
 
Beginner's Guide to High Availability for Postgres - French
Beginner's Guide to High Availability for Postgres - FrenchBeginner's Guide to High Availability for Postgres - French
Beginner's Guide to High Availability for Postgres - French
 
Un guide complet pour la migration de bases de données héritées vers PostgreSQL
Un guide complet pour la migration de bases de données héritées vers PostgreSQLUn guide complet pour la migration de bases de données héritées vers PostgreSQL
Un guide complet pour la migration de bases de données héritées vers PostgreSQL
 
Oracle Data Warehouse
Oracle Data WarehouseOracle Data Warehouse
Oracle Data Warehouse
 
Why Care Risk Choose PostgreSQL
Why Care Risk Choose PostgreSQLWhy Care Risk Choose PostgreSQL
Why Care Risk Choose PostgreSQL
 
Database Dumps and Backups
Database Dumps and BackupsDatabase Dumps and Backups
Database Dumps and Backups
 
9/ IBM POWER @ OPEN'16
9/ IBM POWER @ OPEN'169/ IBM POWER @ OPEN'16
9/ IBM POWER @ OPEN'16
 

Similar to Greenplum Database Open Source December 2015

Open Sourcing GemFire - Apache Geode
Open Sourcing GemFire - Apache GeodeOpen Sourcing GemFire - Apache Geode
Open Sourcing GemFire - Apache GeodeApache Geode
 
An Introduction to Apache Geode (incubating)
An Introduction to Apache Geode (incubating)An Introduction to Apache Geode (incubating)
An Introduction to Apache Geode (incubating)Anthony Baker
 
Quieting noisy neighbor with Intel® Resource Director Technology
Quieting noisy neighbor with Intel® Resource Director TechnologyQuieting noisy neighbor with Intel® Resource Director Technology
Quieting noisy neighbor with Intel® Resource Director TechnologyMichelle Holley
 
CA Unified Infrastructure Management for z Systems: Get a Holistic View of Yo...
CA Unified Infrastructure Management for z Systems: Get a Holistic View of Yo...CA Unified Infrastructure Management for z Systems: Get a Holistic View of Yo...
CA Unified Infrastructure Management for z Systems: Get a Holistic View of Yo...CA Technologies
 
Salesforce Multitenant Architecture: How We Do the Magic We Do
Salesforce Multitenant Architecture: How We Do the Magic We DoSalesforce Multitenant Architecture: How We Do the Magic We Do
Salesforce Multitenant Architecture: How We Do the Magic We DoSalesforce Developers
 
What_to_expect_from_oracle_database_12c
What_to_expect_from_oracle_database_12cWhat_to_expect_from_oracle_database_12c
What_to_expect_from_oracle_database_12cMaria Colgan
 
Removing Barriers Between Dev and Ops
Removing Barriers Between Dev and OpsRemoving Barriers Between Dev and Ops
Removing Barriers Between Dev and OpsVMware Tanzu
 
Cloud Native Batch Processing: Beyond the What and How
Cloud Native Batch Processing: Beyond the What and HowCloud Native Batch Processing: Beyond the What and How
Cloud Native Batch Processing: Beyond the What and HowVMware Tanzu
 
Java Micro Edition (ME) 8 Deep Dive
Java Micro Edition (ME) 8 Deep DiveJava Micro Edition (ME) 8 Deep Dive
Java Micro Edition (ME) 8 Deep Diveterrencebarr
 
DOES15 - Sherry Chang - Intel’s Journey to Large Scale DevOps Transformation
DOES15 - Sherry Chang - Intel’s Journey to Large Scale DevOps Transformation DOES15 - Sherry Chang - Intel’s Journey to Large Scale DevOps Transformation
DOES15 - Sherry Chang - Intel’s Journey to Large Scale DevOps Transformation Gene Kim
 
Beyond PowerPlay: Choose the Right OLAP Tool for Your BI Environment (Cognos...
 Beyond PowerPlay: Choose the Right OLAP Tool for Your BI Environment (Cognos... Beyond PowerPlay: Choose the Right OLAP Tool for Your BI Environment (Cognos...
Beyond PowerPlay: Choose the Right OLAP Tool for Your BI Environment (Cognos...Senturus
 
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Rittman Analytics
 
Was l iberty for java batch and jsr352
Was l iberty for java batch and jsr352Was l iberty for java batch and jsr352
Was l iberty for java batch and jsr352sflynn073
 
Data Science Case Studies: The Internet of Things: Implications for the Enter...
Data Science Case Studies: The Internet of Things: Implications for the Enter...Data Science Case Studies: The Internet of Things: Implications for the Enter...
Data Science Case Studies: The Internet of Things: Implications for the Enter...VMware Tanzu
 
MuleSoft Surat Virtual Meetup#16 - Anypoint Deployment Option, API and Operat...
MuleSoft Surat Virtual Meetup#16 - Anypoint Deployment Option, API and Operat...MuleSoft Surat Virtual Meetup#16 - Anypoint Deployment Option, API and Operat...
MuleSoft Surat Virtual Meetup#16 - Anypoint Deployment Option, API and Operat...Jitendra Bafna
 
Pivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewPivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewVMware Tanzu
 

Similar to Greenplum Database Open Source December 2015 (20)

Open Sourcing GemFire - Apache Geode
Open Sourcing GemFire - Apache GeodeOpen Sourcing GemFire - Apache Geode
Open Sourcing GemFire - Apache Geode
 
An Introduction to Apache Geode (incubating)
An Introduction to Apache Geode (incubating)An Introduction to Apache Geode (incubating)
An Introduction to Apache Geode (incubating)
 
Gemfire Introduction
Gemfire Introduction Gemfire Introduction
Gemfire Introduction
 
Quieting noisy neighbor with Intel® Resource Director Technology
Quieting noisy neighbor with Intel® Resource Director TechnologyQuieting noisy neighbor with Intel® Resource Director Technology
Quieting noisy neighbor with Intel® Resource Director Technology
 
GPCloud ( GP on PKS)
GPCloud ( GP on PKS)GPCloud ( GP on PKS)
GPCloud ( GP on PKS)
 
CA Unified Infrastructure Management for z Systems: Get a Holistic View of Yo...
CA Unified Infrastructure Management for z Systems: Get a Holistic View of Yo...CA Unified Infrastructure Management for z Systems: Get a Holistic View of Yo...
CA Unified Infrastructure Management for z Systems: Get a Holistic View of Yo...
 
Salesforce Multitenant Architecture: How We Do the Magic We Do
Salesforce Multitenant Architecture: How We Do the Magic We DoSalesforce Multitenant Architecture: How We Do the Magic We Do
Salesforce Multitenant Architecture: How We Do the Magic We Do
 
What_to_expect_from_oracle_database_12c
What_to_expect_from_oracle_database_12cWhat_to_expect_from_oracle_database_12c
What_to_expect_from_oracle_database_12c
 
Removing Barriers Between Dev and Ops
Removing Barriers Between Dev and OpsRemoving Barriers Between Dev and Ops
Removing Barriers Between Dev and Ops
 
Cloud Native Batch Processing: Beyond the What and How
Cloud Native Batch Processing: Beyond the What and HowCloud Native Batch Processing: Beyond the What and How
Cloud Native Batch Processing: Beyond the What and How
 
Java Micro Edition (ME) 8 Deep Dive
Java Micro Edition (ME) 8 Deep DiveJava Micro Edition (ME) 8 Deep Dive
Java Micro Edition (ME) 8 Deep Dive
 
DOES15 - Sherry Chang - Intel’s Journey to Large Scale DevOps Transformation
DOES15 - Sherry Chang - Intel’s Journey to Large Scale DevOps Transformation DOES15 - Sherry Chang - Intel’s Journey to Large Scale DevOps Transformation
DOES15 - Sherry Chang - Intel’s Journey to Large Scale DevOps Transformation
 
Beyond PowerPlay: Choose the Right OLAP Tool for Your BI Environment (Cognos...
 Beyond PowerPlay: Choose the Right OLAP Tool for Your BI Environment (Cognos... Beyond PowerPlay: Choose the Right OLAP Tool for Your BI Environment (Cognos...
Beyond PowerPlay: Choose the Right OLAP Tool for Your BI Environment (Cognos...
 
BLADEHarmony Manager
BLADEHarmony ManagerBLADEHarmony Manager
BLADEHarmony Manager
 
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
Data Integration for Big Data (OOW 2016, Co-Presented With Oracle)
 
Was l iberty for java batch and jsr352
Was l iberty for java batch and jsr352Was l iberty for java batch and jsr352
Was l iberty for java batch and jsr352
 
Data Science Case Studies: The Internet of Things: Implications for the Enter...
Data Science Case Studies: The Internet of Things: Implications for the Enter...Data Science Case Studies: The Internet of Things: Implications for the Enter...
Data Science Case Studies: The Internet of Things: Implications for the Enter...
 
MuleSoft Surat Virtual Meetup#16 - Anypoint Deployment Option, API and Operat...
MuleSoft Surat Virtual Meetup#16 - Anypoint Deployment Option, API and Operat...MuleSoft Surat Virtual Meetup#16 - Anypoint Deployment Option, API and Operat...
MuleSoft Surat Virtual Meetup#16 - Anypoint Deployment Option, API and Operat...
 
Pivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical OverviewPivotal Big Data Suite: A Technical Overview
Pivotal Big Data Suite: A Technical Overview
 
Streaming is a Detail
Streaming is a DetailStreaming is a Detail
Streaming is a Detail
 

More from PivotalOpenSourceHub

Zettaset Elastic Big Data Security for Greenplum Database
Zettaset Elastic Big Data Security for Greenplum DatabaseZettaset Elastic Big Data Security for Greenplum Database
Zettaset Elastic Big Data Security for Greenplum DatabasePivotalOpenSourceHub
 
New Security Framework in Apache Geode
New Security Framework in Apache GeodeNew Security Framework in Apache Geode
New Security Framework in Apache GeodePivotalOpenSourceHub
 
Apache Geode Clubhouse - WAN-based Replication
Apache Geode Clubhouse - WAN-based ReplicationApache Geode Clubhouse - WAN-based Replication
Apache Geode Clubhouse - WAN-based ReplicationPivotalOpenSourceHub
 
#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode
#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode
#GeodeSummit: Easy Ways to Become a Contributor to Apache GeodePivotalOpenSourceHub
 
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"PivotalOpenSourceHub
 
#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...
#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...
#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...PivotalOpenSourceHub
 
#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Off-Heap Storage Current and Future Design#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Off-Heap Storage Current and Future DesignPivotalOpenSourceHub
 
#GeodeSummit - Redis to Geode Adaptor
#GeodeSummit - Redis to Geode Adaptor#GeodeSummit - Redis to Geode Adaptor
#GeodeSummit - Redis to Geode AdaptorPivotalOpenSourceHub
 
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & GeodePivotalOpenSourceHub
 
#GeodeSummit - Spring Data GemFire API Current and Future
#GeodeSummit - Spring Data GemFire API Current and Future#GeodeSummit - Spring Data GemFire API Current and Future
#GeodeSummit - Spring Data GemFire API Current and FuturePivotalOpenSourceHub
 
#GeodeSummit - Modern manufacturing powered by Spring XD and Geode
#GeodeSummit - Modern manufacturing powered by Spring XD and Geode#GeodeSummit - Modern manufacturing powered by Spring XD and Geode
#GeodeSummit - Modern manufacturing powered by Spring XD and GeodePivotalOpenSourceHub
 
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...PivotalOpenSourceHub
 
#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...
#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...
#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...PivotalOpenSourceHub
 
#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)
#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)
#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)PivotalOpenSourceHub
 
#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...
#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...
#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...PivotalOpenSourceHub
 
#GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics
#GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics#GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics
#GeodeSummit - Apex & Geode: In-memory streaming, storage & analyticsPivotalOpenSourceHub
 
#GeodeSummit - Where Does Geode Fit in Modern System Architectures
#GeodeSummit - Where Does Geode Fit in Modern System Architectures#GeodeSummit - Where Does Geode Fit in Modern System Architectures
#GeodeSummit - Where Does Geode Fit in Modern System ArchitecturesPivotalOpenSourceHub
 
#GeodeSummit - Design Tradeoffs in Distributed Systems
#GeodeSummit - Design Tradeoffs in Distributed Systems#GeodeSummit - Design Tradeoffs in Distributed Systems
#GeodeSummit - Design Tradeoffs in Distributed SystemsPivotalOpenSourceHub
 
#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode
#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode
#GeodeSummit - Wall St. Derivative Risk Solutions Using GeodePivotalOpenSourceHub
 
Building Apps with Distributed In-Memory Computing Using Apache Geode
Building Apps with Distributed In-Memory Computing Using Apache GeodeBuilding Apps with Distributed In-Memory Computing Using Apache Geode
Building Apps with Distributed In-Memory Computing Using Apache GeodePivotalOpenSourceHub
 

More from PivotalOpenSourceHub (20)

Zettaset Elastic Big Data Security for Greenplum Database
Zettaset Elastic Big Data Security for Greenplum DatabaseZettaset Elastic Big Data Security for Greenplum Database
Zettaset Elastic Big Data Security for Greenplum Database
 
New Security Framework in Apache Geode
New Security Framework in Apache GeodeNew Security Framework in Apache Geode
New Security Framework in Apache Geode
 
Apache Geode Clubhouse - WAN-based Replication
Apache Geode Clubhouse - WAN-based ReplicationApache Geode Clubhouse - WAN-based Replication
Apache Geode Clubhouse - WAN-based Replication
 
#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode
#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode
#GeodeSummit: Easy Ways to Become a Contributor to Apache Geode
 
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"
#GeodeSummit Keynote: Creating the Future of Big Data Through 'The Apache Way"
 
#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...
#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...
#GeodeSummit: Combining Stream Processing and In-Memory Data Grids for Near-R...
 
#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Off-Heap Storage Current and Future Design#GeodeSummit - Off-Heap Storage Current and Future Design
#GeodeSummit - Off-Heap Storage Current and Future Design
 
#GeodeSummit - Redis to Geode Adaptor
#GeodeSummit - Redis to Geode Adaptor#GeodeSummit - Redis to Geode Adaptor
#GeodeSummit - Redis to Geode Adaptor
 
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode
#GeodeSummit - Integration & Future Direction for Spring Cloud Data Flow & Geode
 
#GeodeSummit - Spring Data GemFire API Current and Future
#GeodeSummit - Spring Data GemFire API Current and Future#GeodeSummit - Spring Data GemFire API Current and Future
#GeodeSummit - Spring Data GemFire API Current and Future
 
#GeodeSummit - Modern manufacturing powered by Spring XD and Geode
#GeodeSummit - Modern manufacturing powered by Spring XD and Geode#GeodeSummit - Modern manufacturing powered by Spring XD and Geode
#GeodeSummit - Modern manufacturing powered by Spring XD and Geode
 
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
#GeodeSummit - Using Geode as Operational Data Services for Real Time Mobile ...
 
#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...
#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...
#GeodeSummit - Large Scale Fraud Detection using GemFire Integrated with Gree...
 
#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)
#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)
#GeodeSummit: Democratizing Fast Analytics with Ampool (Powered by Apache Geode)
 
#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...
#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...
#GeodeSummit: Architecting Data-Driven, Smarter Cloud Native Apps with Real-T...
 
#GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics
#GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics#GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics
#GeodeSummit - Apex & Geode: In-memory streaming, storage & analytics
 
#GeodeSummit - Where Does Geode Fit in Modern System Architectures
#GeodeSummit - Where Does Geode Fit in Modern System Architectures#GeodeSummit - Where Does Geode Fit in Modern System Architectures
#GeodeSummit - Where Does Geode Fit in Modern System Architectures
 
#GeodeSummit - Design Tradeoffs in Distributed Systems
#GeodeSummit - Design Tradeoffs in Distributed Systems#GeodeSummit - Design Tradeoffs in Distributed Systems
#GeodeSummit - Design Tradeoffs in Distributed Systems
 
#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode
#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode
#GeodeSummit - Wall St. Derivative Risk Solutions Using Geode
 
Building Apps with Distributed In-Memory Computing Using Apache Geode
Building Apps with Distributed In-Memory Computing Using Apache GeodeBuilding Apps with Distributed In-Memory Computing Using Apache Geode
Building Apps with Distributed In-Memory Computing Using Apache Geode
 

Recently uploaded

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...Elaine Werffeli
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...gajnagarg
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteedamy56318795
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...amitlee9823
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNKTimothy Spann
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 

Recently uploaded (20)

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 

Greenplum Database Open Source December 2015

  • 1. 1© 2015 Pivotal Software, Inc. All rights reserved.
  • 2. 2© 2015 Pivotal Software, Inc. All rights reserved. Greenplum Database Open Source December, 2015
  • 3. 3© 2015 Pivotal Software, Inc. All rights reserved. Forward Looking Statements This presentation contains “forward-looking statements” as defined under the Federal Securities Laws. Actual results could differ materially from those projected in the forward-looking statements as a result of certain risk factors, including but not limited to: (i) adverse changes in general economic or market conditions; (ii) delays or reductions in information technology spending; (iii) the relative and varying rates of product price and component cost declines and the volume and mixture of product and services revenues; (iv) competitive factors, including but not limited to pricing pressures and new product introductions; (v) component and product quality and availability; (vi) fluctuations in VMware’s Inc.’s operating results and risks associated with trading of VMware stock; (vii) the transition to new products, the uncertainty of customer acceptance of new product offerings and rapid technological and market change; (viii) risks associated with managing the growth of our business, including risks associated with acquisitions and investments and the challenges and costs of integration, restructuring and achieving anticipated synergies; (ix) the ability to attract and retain highly qualified employees; (x) insufficient, excess or obsolete inventory; (xi) fluctuating currency exchange rates; (xii) threats and other disruptions to our secure data centers and networks; (xiii) our ability to protect our proprietary technology; (xiv) war or acts of terrorism; and (xv) other one-time events and other important factors disclosed previously and from time to time in the filings EMC Corporation, the parent company of Pivotal, with the U.S. Securities and Exchange Commission. EMC and Pivotal disclaim any obligation to update any such forward-looking statements after the date of this release.
  • 4. 4© 2015 Pivotal Software, Inc. All rights reserved. Safe Harbor “Any information regarding pre-release of Pivotal offerings, future updates or other planned modifications is subject to ongoing evaluation by Pivotal and therefore subject to change. This information is provided without warranty of any kind, express or implied. Customers who purchase Pivotal offerings should make their purchase decision based upon features that are currently available. Pivotal has no obligation to update forward looking information in this presentation.”
  • 5. 5© 2015 Pivotal Software, Inc. All rights reserved.  Relational database system for big data  Mission critical & system of record product with supporting tools and ecosystem  Fully open source with a global community of developers and users  Implement world’s leading research in database technology across all components – Optimizer, Query Execution – Transaction Processing, Database Storage, Compression, High Availability – Embedded Programming Languages (Python, R, Java, etc …. ) – In-Database analytics in domains (e.g. Geospatial, Text, Machine Learning, Mathematics, etc …. )  Performance tuned for multiple workload profiles – Analytics, long running queries, short running queries, mixed workloads  Large industrial focused system – Financial, Government, Telecom, Retail, Manufacturing, Oil & Gas, etc……. Greenplum Database Mission & Strategy
  • 6. 6© 2015 Pivotal Software, Inc. All rights reserved.  An ambitious project – 10 years in the making – Investment of hundred of millions of dollars – Potential to define a new market and disrupt traditional EDW vendors  www.greenplum.org – Github code – mailing lists / community engagement – Global project w/ external contributors  Pivotal Greenplum – Enterprise software distribution & release management – Pivotal expertise – 24-hour global support Greenplum Open Source
  • 7. 7© 2015 Pivotal Software, Inc. All rights reserved. PostgreSQL Compatibility Roadmap • Strategic backport key features from PostgreSQL to Greenplum … JSONB, UUID, Variadic functions, Default function arguments, etc. • Consistent back porting of patches from older PostgreSQL to Greenplum …
  • 8. 8© 2015 Pivotal Software, Inc. All rights reserved. MPP Shared Nothing Architecture Standby Master Segment Host with one or more Segment Instances Segment Instances process queries in parallel Performance Through Segment Instance Parallelism High speed interconnect for continuous pipelining of data processing … Master Host SQL Master Host and Standby Master Host Master coordinates work with Segment Hosts Interconnect Segment Host Segment Instance Segment Instance Segment Instance Segment Instance Segment Hosts have their own CPU, disk and memory (shared nothing) Segment Host Segment Instance Segment Instance Segment Instance Segment Instance node1 Segment Host Segment Instance Segment Instance Segment Instance Segment Instance node2 Segment Host Segment Instance Segment Instance Segment Instance Segment Instance node3 Segment Host Segment Instance Segment Instance Segment Instance Segment Instance nodeN
  • 9. 9© 2015 Pivotal Software, Inc. All rights reserved. Master Host Master Segment Catalog Query Optimizer Distributed TM DispatchQuery Executor Parser enforces syntax, semantics and produces a parse tree Client Accepts client connections, incoming user requests and performs authentication Parser Master Host
  • 10. 10© 2015 Pivotal Software, Inc. All rights reserved. Pivotal Query Optimizer Local Storage Master Segment CatalogDistributed TM Interconnect DispatcherQuery Executor Parser Query Optimizer Consumes the parse tree and produces the query plan Query execution plan contains how the query is executed Master Host Segment Instance Local TM Query Executor Catalog Local Storage Segment Host Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Host Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Host Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage
  • 11. 11© 2015 Pivotal Software, Inc. All rights reserved. Query Dispatcher Local Storage Master Segment CatalogDistributed TM Interconnect Query Optimizer Query Executor Parser Dispatcher Responsible for communicating the query plan to segments Allocates cluster resources required to perform the job and accumulating/presenti ng final results Master Host Segment Instance Local TM Query Executor Catalog Local Storage Segment Host Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Host Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Host Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage
  • 12. 12© 2015 Pivotal Software, Inc. All rights reserved. Query Executor Local Storage Master Segment CatalogDistributed TM Interconnect Query Optimizer Query Dispatcher Parser Query Executor Responsible for executing the steps in the plan (e.g. open file, iterate over tuples) Communicates its intermediate results to other executor processes Segment Instance Local TM Query Executor Catalog Local Storage Segment Host Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Host Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Host Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Master Host
  • 13. 13© 2015 Pivotal Software, Inc. All rights reserved. Distributed Transaction Management Local Storage Master Segment Query Executor Catalog Interconnect Query Optimizer Query Dispatcher Parser Distributed TM Segments have their own commit and replay logs and decide when to commit, abort for their own transactions DTM resides on the master and coordinates the commit and abort actions of segments Segment Instance Local TM Query Executor Catalog Local Storage Segment Host Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Host Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Segment Host Segment Instance Local TM Query Executor Catalog Local Storage Segment Instance Local TM Query Executor Catalog Local Storage Master Host
  • 14. 14© 2015 Pivotal Software, Inc. All rights reserved. GPDB High Availability  Master Host mirroring – Warm Standby Master Host ▪ Replica of Master Host system catalogs – Eliminates single point of failure – Synchronization process between Master Host and Standby Master Host ▪ Uses PostgreSQL WAL Replication  Segment mirroring – Creates a mirror segment for every primary segment ▪ Uses a custom file block replication process – If a primary segment becomes unavailable automatic failover to the mirror
  • 15. 15© 2015 Pivotal Software, Inc. All rights reserved. Define the Storage Model CREATE TABLE  Heap Tables versus Append Optimized (AO) Tables  Row oriented storage versus Column oriented storage  Compression – Table level compression applied to entire table – Column level compression applied to a specific column w/ columnar storage – Zlib level with Run Length Encoding Optional
  • 16. 16© 2015 Pivotal Software, Inc. All rights reserved. Polymorphic Storage™ User Definable Storage Layout  Columnar storage compresses better  Optimized for retrieving a subset of the columns when querying  Compression can be set differently per column: gzip (1-9), quicklz, delta, RLE  Row oriented faster when returning all columns  HEAP for many updates and deletes  Use indexes for drill through queries TABLE ‘SALES’ Jun Column-orientedRow-oriented Oct Year - 1 Year - 2 External HDFS  Less accessed partitions on HDFS with external partitions to seamlessly query all data  Text, CSV, Binary, Avro, Parquet format  All major HDP Distros Nov DecJul Aug Sep
  • 17. 17© 2015 Pivotal Software, Inc. All rights reserved. CREATE TABLE Define Data Distributions  One of the most important aspects of GPDB!  Every table has a distribution method  DISTRIBUTED BY (column) – Uses a hash distribution  DISTRIBUTED RANDOMLY – Uses a random distribution which is not guaranteed to provide a perfectly even distribution  Explicitly define a column or random distribution for all tables – Do not use the default
  • 18. 18© 2015 Pivotal Software, Inc. All rights reserved. Data Distribution: The Key to Parallelism The primary strategy and goal is to spread data evenly across all segment instances. Most important in a MPP shared nothing architecture! 43 Oct 20 2005 12 64 Oct 20 2005 111 45 Oct 20 2005 42 46 Oct 20 2005 64 77 Oct 20 2005 32 48 Oct 20 2005 12 Order Order# Order Date Customer ID 50 Oct 20 2005 34 56 Oct 20 2005 213 63 Oct 20 2005 15 44 Oct 20 2005 102 53 Oct 20 2005 82 55 Oct 20 2005 55
  • 19. 19© 2015 Pivotal Software, Inc. All rights reserved. CREATE TABLE Define Partitioning  Reduces the amount of data to be scanned by reading only the relevant data needed to satisfy a query – The only goal of partitioning is to achieve partition elimination aka partition pruning  Is not a substitution for distributions – A good distribution strategy and partitioning that achieves partition elimination unlocks performance magic  Uses table inheritance and constraints – Persistent relationship between parent and child tables
  • 20. 20© 2015 Pivotal Software, Inc. All rights reserved. Segment 1A Segment 1B Segment 1C Segment 1D Segment 2A Segment 2B Segment 2C Segment 2D Segment 3A Segment 3B Segment 3C Segment 3D Segment 1A Segment 1B Segment 1C Segment 1D Segment 2A Segment 2B Segment 2C Segment 2D Segment 3A Segment 3B Segment 3C Segment 3D Segment 1A Segment 1B Segment 1C Segment 1D Segment 2A Segment 2B Segment 2C Segment 2D Segment 3A Segment 3B Segment 3C Segment 3D Distributions and Partitioning SELECT COUNT(*) FROM orders WHERE order_date >= ‘Oct 20 2007’ AND order_date < ‘Oct 27 2007’ & Evenly distribute orders data across all segments Only scans the relevant order partitions Segment 1A Segment 1B Segment 1C Segment 1D Segment 2A Segment 2B Segment 2C Segment 2D Segment 3A Segment 3B Segment 3C Segment 3D
  • 21. 21© 2015 Pivotal Software, Inc. All rights reserved. Polymorphic Storage™ User Definable Storage Layout  Columnar storage compresses better  Optimized for retrieving a subset of the columns when querying  Compression can be set differently per column: gzip (1-9), quicklz, delta, RLE  Row oriented faster when returning all columns  HEAP for many updates and deletes  Use indexes for drill through queries TABLE ‘SALES’ Jun Column-orientedRow-oriented Oct Year - 1 Year - 2 External HDFS  Less accessed partitions on HDFS with external partitions to seamlessly query all data  Text, CSV, Binary, Avro, Parquet format  All major HDP Distros Nov DecJul Aug Sep
  • 22. 22© 2015 Pivotal Software, Inc. All rights reserved. Analytics • Bringing the power of parallelism to modeling and analytics Path Functions • Identify rows of interest from raw table or view • Pattern match across rows using regex • Define one or more windows on the matches • Apply standard PostgreSQL window functions or aggregations on the windows Future Roadmap Support Vector Machines GP Text • Time Series, Gap Filling • Complex Number Support
  • 23. 23© 2015 Pivotal Software, Inc. All rights reserved. • Government detection of benefits that should not be made • Government detection of tax fraud • Government economic statistics research database • Commercial banking wealth management data science and product development • Financial corporation's risk and trade repositories reporting • Pharmaceutical company vaccine potency prediction based on manufacturing sensors • 401K providers analytics on investment choices • Auto manufacturer’s analytics on predictive maintenance • Corporate/Financial internal email and communication surveillance and reporting • Oil drilling equipment predictive maintenance • Mobile telephone company enterprise data warehouse • Retail store chain customer purchases analytics • Airlines loyalty program analytics • Telecom company network performance and availability analytics • Corporate network anomalous behavior and intrusion detections • Semiconductor Fab sensor analytics and reporting Highlighted Greenplum successes
  • 24. 24© 2015 Pivotal Software, Inc. All rights reserved. Recent Accomplishments 4.3.5.0 April 2015 GA of Pivotal Query Optimizer With Parallel & Incremental Analyze External Partitions GP-Workload ManagerPivotal Query Optimizer GPDB 4.3.5 May 2015 GPDB 4.3.6 Sept 2015 GPCC 2.0 Dec 2015
  • 25. 25© 2015 Pivotal Software, Inc. All rights reserved. Recent Accomplishments 4.3.5.0 April 2015 GA of Pivotal Query Optimizer With Parallel & Incremental Analyze Greenplum Open Source EMC DCA V3 Topic Modelling & Matrix Operations Madlib 1.8 July 2015 October 2015 DCA V3 Dec 2015
  • 26. 26© 2015 Pivotal Software, Inc. All rights reserved. Pivotal Greenplum Roadmap Highlights ● S3 External Tables ● Performance tuned for AWS ● Dynamic Code Generation using LLVM ● Short running query performance enhancements ● Faster analyze ● WAL Replication Segment Mirroring ● Incremental restore MVP ● Disk space full warnings ● Snapshot Backup ● Anaconda Python Modules: NLTK, etc ● Time Series Gap Filling ● Complex Numbers ● PostGIS Raster Support ● Geospatial Trajectories ● Path analytics ● Enhanced SVM module ● Py-Madlib ● Lock Free Backup
  • 27. Greenplum File System Primer Yon Lew zData Inc
  • 28. Directory Structure • One directory per database per segment • <base_dir>/<seg_dir>/base/<database oid> e.g. /d/d2/primary/gpseg_37/base/19002 • SELECT oid, datname FROM pg_database;
  • 29. Data Files • Each file is named using the pg_classs.relfilenode column of its relation SELECT relfilenode FROM pg_class WHERE oid = ‘test.mytable’::regclass; • Originally relfilenode is equal to the OID of the relation but numerous database operations (e.g. truncate) can change this value
  • 30. Diagnostics • CREATE EXTERNAL WEB TABLE database_files ( host TEXT , segment INT , file TEXT , mtime TIMESTAMP , sz BIGINT ) EXECUTE E’ls –l –time-style=+%Y%m%d_%H:%M:%S $GP_SEG_DATADIR/base/<database_oid> | awk {’print ENVIRON[“HOSTNAME”]”|”ENVIRON[“GP_SEGMENT_ID”]”|”$7”|”$6”|”$5’ }’ ON ALL FORMAT ‘text’ (DELIMITER E’|’ NULL ‘’);
  • 31. Diagnostics • Querying this table can produce substantial load since it stats every file in the cluster • Views can easily be built on top of table to join back to pg_class
  • 32. Heap Tables • One data file per heap table for tuple storage • Minimum file size is equal to the default blocksize defined for the database CREATE TABLE test1 (a INT, b VARCHAR, c DATE); INSERT INTO test1 VALUES(1, ‘a’, current_date); SELECT segment_id, sz FROM database_file WHERE file like ‘<relfilenode>%’; segment | sz 0 | 0 1 | 0 2 | 32768 3 | 0
  • 33. AO Tables • Either row or columnar orientation • Variable file size • Columnar tables have one file per column (files with format <relfilenode>.*) • Concurrent loads also create a set of new files related to each table • AO tables initially consist of a single empty file in each data directory until data is inserted • Data files are not limited to a minimum size corresponding to the database blocksize.
  • 34. AO Tables CREATE TABLE test1 (a INT, b VARCHAR, c DATE) WITH (appendonly=true, orientation = row); SELECT segment_id, file, sz FROM database_file WHERE file like ‘<relfilenode>%’; segment | file | sz 0 | 3000010 | 0 1 | 3000010 | 0 2 | 3000010 | 0 3 | 3000010 | 0 INSERT INTO test1 VALUES(1, ‘a’, current_date); SELECT segment_id, sz FROM database_file WHERE file like ‘<relfilenode>%’; segment | file | sz 0 | 3000010 | 0 1 | 3000010 | 0 2 | 3000010 | 0 2 | 3000010.1 | 40 3 | 3000010 | 0
  • 35. AO Tables CREATE TABLE test1 (a INT, b VARCHAR, c DATE) WITH (appendonly=true, orientation = column); SELECT segment_id, file, sz FROM database_file WHERE file like ‘<relfilenode>%’; segment | file | sz 0 | 3000010 | 0 1 | 3000010 | 0 2 | 3000010 | 0 3 | 3000010 | 0 INSERT INTO test1 VALUES(1, ‘a’, current_date); SELECT segment_id, sz FROM database_file WHERE file like ‘<relfilenode>%’; segment | file | sz 0 | 3000010 | 0 1 | 3000010 | 0 2 | 3000010 | 0 2 | 3000010.1 | 40 2 | 3000010.129 | 40 2 | 3000010.257 | 40 3 | 3000010 | 0
  • 36. AO Tables • For large fact tables ADD/DROP COLUMN operations are much faster carried out against AO columnar tables as no rewrite of data files is required.
  • 37. AO Tables • Beware of large numbers of concurrent loads running against AO tables • For example, 50 concurrent loads running against an AO columnar table with 500 columns will produce 20000 primary segment files on a single segment host (500 column files x 50 loads x 8 primary segments) • File system efficiency can decline drastically as the number of files increases
  • 38. AO Tables • Workarounds: 1. Rebuild the partition via batch processing every night (CTAS followed by a partition swap) 2. Load into a heap organized staging table
  • 39. Skew • Typically skew is discovered due to unbalanced storage in one or more segments in the cluster • Skew in the gp_toolkit view is calculated by querying the hidden gp_segment_id column SELECT gp_segment_id, count(*) FROM mytable GROUP BY 1; • This operation is prohibitively expensive when querying all tables in a cluster
  • 40. Skew • Querying file metadata with the diagnostic table is much faster • Coefficient of variation, interquartile range SELECT substring(file, ‘([0-9]+)’), , stddev(sz)/avg(sz) FROM database_files GROUP BY 1 HAVING SUM(sz) != 0;
  • 41. Bloat • Checking for skew via the gp_segment_id column will miss physical skew due to bloat (dead space from deleted/updated tuples).
  • 42. Join the community! • Website • Mailing lists • Github • Events • More ….
  • 43. 43© 2015 Pivotal Software, Inc. All rights reserved.