SlideShare a Scribd company logo
1 of 38
Download to read offline
ESA UNCLASSIFIED - For Official Use
Distributing big astronomical catalogues
with Greenplum
Pilar de Teodoro
European Space Astronomy Centre, Madrid, Spain
Greenplum Summit, NYC, 19/03/2019
@pteodoro
ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 2
Outlines
• What is ESA and ESDC
• Why scaling out
• Tests performed
• Lessons learnt and what is missing
• Next steps
ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 3
European Space Agency
ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 5
Data Flow
Mission Operations
Center
ESOC: Germany
Science Operations
Center
ESAC: Spain
Esac Science Data
Center
ESAC: Spain
ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 6
http://archives.esac.esa.int
ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 7
Euclid Scientific Archive System
Access to scientific data online
ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 8
The ESA Fleet for Astrophysics
Data at ESDC
ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 9
Euclid will add
45PB by 2025
ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 10
Databases Size at ESDC (January 2019)
0
5000
10000
15000
20000
25000
XSA CSA HST "HSA" PSA ESASKY Euclid LPFSA GACS
Size (GB)
Size (GB)
ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 11
Database Systems at ESDC
Most archives: PostgreSQL: 9.4 to 11.1 (15 operational databases)
• Open Source
• Big community
• Spherical queries (pg_sphere, q3c, healpix, postgis)
• Professional companies support (that we do not use at our own risk)
ISO: Sybase
Euclid DPS: Oracle
ESASky, PSA (some functionalities): ElasticSearch
Euclid Prototype: Spark
Import from Files, MySQL…
ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 12
Databases HW
Chosen to scale up:
Largest (GACS):
Dell PowerEdge R920 Memory: 1.5TB
48 cores
7 TB SSD
13 TB SAS
21TB NetApp
ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 13
Plan for Scale-out Tests
Objective: Big Datasets queried in efficient times (Requirements).
Ingestion performance.
Pre-Requisites: Resources, data to ingest (Catalogues, Relational
tables)
Comparison: ingestion time, retrieving time, administration, problems
arisen.
ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 14
PostgreSQL Scale-out Solutions Tested
Citus by CitusData
• Citus 7, extension installed with Postgres-9
Postgres-XL by 2nd Quadrant
• Postgres-10 alpha2 (postgres 10.3)- Euclid
• Postgres-XL 9.5 R1.6 - Gaia
• Pivotal GreenPlum 5.9
ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 15
Postgres-XL Architecture
ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 16
Postgres-XL Architecture for Testing
Scicloud-27 VMs
ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 17
Citus Architecture
ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 18
Citus Architecture for Testing
• PostgreSQL-9/PostgreSQL-10
• Citus extension:7.1
Master
Node
Worker
Node1
Worker
Node2
Worker
Node3
32GB
RAM
32GB
RAM
32GB
RAM
32GB
RAM
Scicloud-4VMs
ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 19
Greenplum Architecture
ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 20
Greenplum architecture for Testing
Scicloud VMs:
Scicloud Bare Metal
Dell Cluster
Master
dn0
6
dn0
1
32GB/ 8 cores
4 segments
Master
256GB/ 48 cores
Master
Master
Master
Master
dn01
4/8 segments
Netapp
SAS Drive/
Netapp
Master
256GB/ 48 cores
Master
Master
Master
dn01
32/64 segments
SSD
ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 21
Simple Query Tests: Postgres-XL vs
Citus
Select *
from kids_mb_catalogue
Where mag_auto_i_trunc=XX
On Kids catalogue:
v 16M rows
v 12GB total
v Structured data
v Comparison for Single instance(SI) on postgres v10 vs
postgres-xl, citus running on 3 nodes and postgres-xl
running on 25 nodes. Time grows with the number of
rows.
Filter Rows
10 2389562
11 13
12 46
13 165
14 1615
15 21932
16 66365
17 116562
18 199186
19 341422
20 600520
21 977292
22 1496713
23 2144309
24 2722471
25 2654069
26 1761394
Filter Rows
27 730583
28 250883
29 90818
30 34346
31 13285
32 5320
33 2149
34 820
35 361
36 139
37 62
38 27
39 4
40 5
41 3
42 1
43 0
ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 22
Postgres Single Instance vs XL vs Citus
(hash distribution)
0
50000
100000
150000
200000
250000
300000
2389562
13
46
165
1615
21932
66365
116562
199186
341422
600520
977292
1496713
2144309
2722471
2654069
1761394
730583
250883
90818
34346
13285
5320
2149
820
361
139
62
27
4
5
3
1
0
Timeinms
Rows number
postgres-xl-3nodes
postgres-xl-25_nodes
SI-10
citus7.1-3nodes
ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 24
Hardware Testing for
Environment (Postgres-XL or
SI)
Time
Netapp (10 partitions) 3h 46m
SSD (10 partitions) 1h 17m
Postgres-XL 10 nodes (VM on
physical nodes-Netapp)
2h 35min
Postgres-XL 25 nodes (Netapp
scicloud)
35m
Netapp (FDW postgres-xl 25 nodes) 1h 40m
Flagship_mock table: 1,1TB Rows#: 2,5B
Select count(*) from flagship_mock;
ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 25
Greenplum Testing
• Ingestion
• Expansion
• Performance
• High Avalaibility
• Resource Management
• Community edition vs Commercial edition
ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 26
Greenplum Testing
Community edition does not include:
• Support from Pivotal Greenplum team
• Product packaging and installation script.
• Support for QuickLZ compression. QuickLZ compression is not provided in the open source version of Greenplum Database due
to licensing restrictions.
• Support for managing Greenplum Database using Pivotal Greenplum Command Center.
• Support for full text search and text analysis using Pivotal GPText.
• Support for data connectors:
•Greenplum-Spark Connector
•Greenplum-Informatica Connector
•Greenplum-Kafka Integration
•Gemfire-Greenplum Connector
• Data Direct ODBC/JDBC Drivers
• gpcopy utility for copying or migrating objects between Greenplum systems.
ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 27
GP Testing: Ingestion
gpfdist: nohup gpfdist -d /spv02/ -p 4000 > gpfdist_4000.log 2>&1 &
Create external table flag_ext()
location ('gpfdist://172.25.4.244:4000/*’) FORMAT 'CSV’;
273*10GB csv files in 3 hours (NetApp) – 6 gpfdist servers
The benefit of using gpfdist is that you are guaranteed maximum parallelism while
reading from or writing to external tables, thereby offering the best performance
as well as easier administration of external tables.
From: https://gpdb.docs.pivotal.io/520/utility_guide/admin_utilities/gpfdist.html
ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 28
GP Testing: Expansion
GPExpand: Expanding the cluster to new datanodes or same ones increasing
number of segments.
10 Gb network
ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 29
GP Testing: Performance
Performance:
• Gaia query profiling tests: Catalogues XMatch
• Euclid query profiling tests: 1,3TB catalogue
What to test:
• Table formats
• Scalability
ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 30
GP testing: Xmatch Performance
Most relevant queries: Gaia query
SELECT gdr1.source_id AS iddr1, gdr2.source_id AS iddr2
FROM gaiadr1.gaia_source AS gdr1, gaiadr2.gaia_source AS gdr2
WHERE q3c_join(gdr2.ra, gdr2.dec, gdr1.ra, gdr1.dec, 0.5/3600)
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
6
6.1
GACSDB02 GP-LIM-4-64-HEAP_C GP-BM-5-40-AO
ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 32
GP testing: Parameter Query performance
SELECT q3c_dist(public.flagship_mock."ra_gal",public.flagship_mock."dec_gal",220.,0.0) AS dist , * FROM public.flagship_mock
euclid WHERE public.flagship_mock."galaxy_id" = 2 AND public.flagship_mock."gr_cosmos" > 1;
ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 33
GP Testing: Performance
select count(*) form euclid.flagship_mock; Table format Time(s)
GP-BM-5-40-QLZ 14
GP-6-24-QLZ 22
GP-6-24-COL 33
GP-LIM-4-32-HEAP 457
GP-LIM-4-64-HEAP 533
GP-BM-5-40-AO 581
GP-6-24-AO 1693
GP-BM-5-40-HEAP-C 1695
GP-6-24 1808
GP-2-8 8055
Postgres-XL-10nodes 9324
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
G
P-B
M
-5-40-Q
LZ
G
P-6-24-Q
LZ
G
P-6-24-CO
L
G
P-LIM
-4-32-H
EAP
G
P-LIM
-4-64-H
EAP
G
P-B
M
-5-40-A
O
G
P-6-24-AO
G
P-B
M
-5-40-…G
P-6-24
G
P-2-8
postgres-X
L-…
count(*) (s)
ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 34
GP Testing: Performance
Select count(*) from gaiadr2.gaia_source: 1692919135 rows
0
200
400
600
800
1000
1200
G
P-B
M
-5-40-Q
LZ
G
P-LIM
-4-32-H
EAP
G
P-B
M
-5-40-A
OG
P-6-24-CO
L
G
P-B
M
-5-20-A
O
gacsdb02
G
P-6-24
count(*) (s)
Table format Time(s)
GP-BM-5-40-QLZ 6
GP-LIM-4-32-HEAP 7
GP-BM-5-40-AO 10
GP-6-24-COL 14
GP-BM-5-20-AO 59
gacsdb02 323
GP-6-24 980
ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 35
GP Testing: High Availability
ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 36
GP Testing: Managing Resources
• Resource groups: set and enforce CPU, memory, and concurrent transaction
limits in Greenplum Database. Use Linux Control Groups (cgroups) to manage CPU
resources.
• Resource queues: manage the degree of concurrency in a Greenplum Database
system.
•manage the number of active queries that may execute concurrently,
•the amount of memory each type of query is allocated,
•priority of queries.
ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 37
GP Testing: Resource management
ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 38
Lessons Learnt and what is missing(I)
v Yes, Postgres-XL is scaling but:
vstill some issues about stability: GTM memory leak, workaround GTM proxy
vdoes not work with foreign data wrappers
vGTM does not rotate logs
vnot direct monitoring tool
vReshuffling not working
vJDBC bug identified by ESDC not resolved yet
v Citus:
vas it is an extension can make some workers to behave as single instance.
vWill need to work on performance to retrieve large amount of rows.
vAdministration is easier than in Postgres-XL.
ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 39
v Greenplum is the most mature postgres-flavour solution for us.
• Active developers community
• GP 5 based on postgres 8.3 use “reserved words”, TAP problem, resolved in future GP6
• Going to GP 6 based on postgres 9.4
• Community edition is enough for ESDC currently
• Use of existing HW
• Resource management if needed
• Best performance: load big catalogues in different format depending on the query and
when parsing it decide which one to use using TAP.
• Relational datamodels must be adapted. No FKs are enforced.
Lessons Learnt and what is missing(II)
ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 40
Near Future Tests
Euclid: Postgres-XL will be replaced by GP.
We will continue ingesting catalogues in this system and testing.
ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 42
Thank you

More Related Content

What's hot

Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019
Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019
Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019VMware Tanzu
 
Greenplum Kontained: Coordinating Many PostgreSQL Instances on Kubernetes: Cl...
Greenplum Kontained: Coordinating Many PostgreSQL Instances on Kubernetes: Cl...Greenplum Kontained: Coordinating Many PostgreSQL Instances on Kubernetes: Cl...
Greenplum Kontained: Coordinating Many PostgreSQL Instances on Kubernetes: Cl...VMware Tanzu
 
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019VMware Tanzu
 
Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...
Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...
Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...VMware Tanzu
 
Greenplum-Spark November 2018
Greenplum-Spark November 2018Greenplum-Spark November 2018
Greenplum-Spark November 2018KongYew Chan, MBA
 
Greenplum Overview for Postgres Hackers - Greenplum Summit 2018
Greenplum Overview for Postgres Hackers - Greenplum Summit 2018Greenplum Overview for Postgres Hackers - Greenplum Summit 2018
Greenplum Overview for Postgres Hackers - Greenplum Summit 2018VMware Tanzu
 
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...VMware Tanzu
 
Large-scaled telematics analytics
Large-scaled telematics analyticsLarge-scaled telematics analytics
Large-scaled telematics analyticsDataWorks Summit
 
Change Data Streaming Patterns for Microservices With Debezium
Change Data Streaming Patterns for Microservices With Debezium Change Data Streaming Patterns for Microservices With Debezium
Change Data Streaming Patterns for Microservices With Debezium confluent
 
Presto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performancePresto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performanceDataWorks Summit
 
#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More CapacityGera Shegalov
 
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...DataStax
 
Greenplum for Kubernetes PGConf india 2019
Greenplum for Kubernetes PGConf india 2019Greenplum for Kubernetes PGConf india 2019
Greenplum for Kubernetes PGConf india 2019Goutam Tadi
 
An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...
An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...
An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...Databricks
 
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...DataWorks Summit
 
Stsg17 speaker yousunjeong
Stsg17 speaker yousunjeongStsg17 speaker yousunjeong
Stsg17 speaker yousunjeongYousun Jeong
 
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...DataWorks Summit
 
Gobblin: Unifying Data Ingestion for Hadoop
Gobblin: Unifying Data Ingestion for HadoopGobblin: Unifying Data Ingestion for Hadoop
Gobblin: Unifying Data Ingestion for HadoopYinan Li
 

What's hot (20)

Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019
Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019
Operationalizing AI at scale using MADlib Flow - Greenplum Summit 2019
 
Greenplum Kontained: Coordinating Many PostgreSQL Instances on Kubernetes: Cl...
Greenplum Kontained: Coordinating Many PostgreSQL Instances on Kubernetes: Cl...Greenplum Kontained: Coordinating Many PostgreSQL Instances on Kubernetes: Cl...
Greenplum Kontained: Coordinating Many PostgreSQL Instances on Kubernetes: Cl...
 
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
AI on Greenplum Using
 Apache MADlib and MADlib Flow - Greenplum Summit 2019
 
Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...
Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...
Maximize Greenplum For Any Use Cases Decoupling Compute and Storage - Greenpl...
 
Greenplum-Spark November 2018
Greenplum-Spark November 2018Greenplum-Spark November 2018
Greenplum-Spark November 2018
 
Greenplum Overview for Postgres Hackers - Greenplum Summit 2018
Greenplum Overview for Postgres Hackers - Greenplum Summit 2018Greenplum Overview for Postgres Hackers - Greenplum Summit 2018
Greenplum Overview for Postgres Hackers - Greenplum Summit 2018
 
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...
A Modern Interface for Data Science on Postgres/Greenplum - Greenplum Summit ...
 
Large-scaled telematics analytics
Large-scaled telematics analyticsLarge-scaled telematics analytics
Large-scaled telematics analytics
 
Change Data Streaming Patterns for Microservices With Debezium
Change Data Streaming Patterns for Microservices With Debezium Change Data Streaming Patterns for Microservices With Debezium
Change Data Streaming Patterns for Microservices With Debezium
 
Presto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performancePresto query optimizer: pursuit of performance
Presto query optimizer: pursuit of performance
 
#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity#SlimScalding - Less Memory is More Capacity
#SlimScalding - Less Memory is More Capacity
 
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
Cassandra on Google Cloud Platform (Ravi Madasu, Google / Ben Lackey, DataSta...
 
Greenplum for Kubernetes PGConf india 2019
Greenplum for Kubernetes PGConf india 2019Greenplum for Kubernetes PGConf india 2019
Greenplum for Kubernetes PGConf india 2019
 
Hadoop Everywhere
Hadoop EverywhereHadoop Everywhere
Hadoop Everywhere
 
Admiral Group
Admiral GroupAdmiral Group
Admiral Group
 
An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...
An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...
An End-to-End Spark-Based Machine Learning Stack in the Hybrid Cloud with Far...
 
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
Observing Intraday Indicators Using Real-Time Tick Data on Apache Superset an...
 
Stsg17 speaker yousunjeong
Stsg17 speaker yousunjeongStsg17 speaker yousunjeong
Stsg17 speaker yousunjeong
 
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
Unify Stream and Batch Processing using Dataflow, a Portable Programmable Mod...
 
Gobblin: Unifying Data Ingestion for Hadoop
Gobblin: Unifying Data Ingestion for HadoopGobblin: Unifying Data Ingestion for Hadoop
Gobblin: Unifying Data Ingestion for Hadoop
 

Similar to Distributing big astronomical catalogues with Greenplum - Greenplum Summit 2019

Greenplum: A Pivotal Moment on Wall Street - Greenplum Summit 2018
Greenplum: A Pivotal Moment on Wall Street - Greenplum Summit 2018Greenplum: A Pivotal Moment on Wall Street - Greenplum Summit 2018
Greenplum: A Pivotal Moment on Wall Street - Greenplum Summit 2018VMware Tanzu
 
Greenplum Experts Panel, Greenplum Operations at Scale - Greenplum Summit 2019
Greenplum Experts Panel, Greenplum Operations at Scale - Greenplum Summit 2019Greenplum Experts Panel, Greenplum Operations at Scale - Greenplum Summit 2019
Greenplum Experts Panel, Greenplum Operations at Scale - Greenplum Summit 2019VMware Tanzu
 
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Intel® Software
 
NGD Systems and Microsoft Keynote Presentation at IPDPS MPP in Vacouver
NGD Systems and Microsoft Keynote Presentation at IPDPS MPP in VacouverNGD Systems and Microsoft Keynote Presentation at IPDPS MPP in Vacouver
NGD Systems and Microsoft Keynote Presentation at IPDPS MPP in VacouverScott Shadley, MBA,PMC-III
 
Distributed deep learning reference architecture v3.2l
Distributed deep learning reference architecture v3.2lDistributed deep learning reference architecture v3.2l
Distributed deep learning reference architecture v3.2lGanesan Narayanasamy
 
NVIDIA Rapids presentation
NVIDIA Rapids presentationNVIDIA Rapids presentation
NVIDIA Rapids presentationtestSri1
 
Gc vit sttp cc december 2013
Gc vit sttp cc december 2013Gc vit sttp cc december 2013
Gc vit sttp cc december 2013Seema Shah
 
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォームPivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォームMasayuki Matsushita
 
Very large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLVery large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLDESMOND YUEN
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsConnected Data World
 
Future services on Janet
Future services on JanetFuture services on Janet
Future services on JanetJisc
 
Managing Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic OptimizingManaging Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic OptimizingDatabricks
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Databricks
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUsiguazio
 
Low Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard PlatformLow Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard Platforma3labdsp
 
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015Junli Gu
 
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics EcosystemXDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics EcosystemDan Eaton
 

Similar to Distributing big astronomical catalogues with Greenplum - Greenplum Summit 2019 (20)

Greenplum: A Pivotal Moment on Wall Street - Greenplum Summit 2018
Greenplum: A Pivotal Moment on Wall Street - Greenplum Summit 2018Greenplum: A Pivotal Moment on Wall Street - Greenplum Summit 2018
Greenplum: A Pivotal Moment on Wall Street - Greenplum Summit 2018
 
Greenplum Experts Panel, Greenplum Operations at Scale - Greenplum Summit 2019
Greenplum Experts Panel, Greenplum Operations at Scale - Greenplum Summit 2019Greenplum Experts Panel, Greenplum Operations at Scale - Greenplum Summit 2019
Greenplum Experts Panel, Greenplum Operations at Scale - Greenplum Summit 2019
 
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
Bring Intelligent Motion Using Reinforcement Learning Engines | SIGGRAPH 2019...
 
E3MV - Embedded Vision - Sundance
E3MV - Embedded Vision - SundanceE3MV - Embedded Vision - Sundance
E3MV - Embedded Vision - Sundance
 
NGD Systems and Microsoft Keynote Presentation at IPDPS MPP in Vacouver
NGD Systems and Microsoft Keynote Presentation at IPDPS MPP in VacouverNGD Systems and Microsoft Keynote Presentation at IPDPS MPP in Vacouver
NGD Systems and Microsoft Keynote Presentation at IPDPS MPP in Vacouver
 
Distributed deep learning reference architecture v3.2l
Distributed deep learning reference architecture v3.2lDistributed deep learning reference architecture v3.2l
Distributed deep learning reference architecture v3.2l
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
 
NVIDIA Rapids presentation
NVIDIA Rapids presentationNVIDIA Rapids presentation
NVIDIA Rapids presentation
 
Gc vit sttp cc december 2013
Gc vit sttp cc december 2013Gc vit sttp cc december 2013
Gc vit sttp cc december 2013
 
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォームPivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
 
Very large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDLVery large scale distributed deep learning on BigDL
Very large scale distributed deep learning on BigDL
 
RAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needsRAPIDS cuGraph – Accelerating all your Graph needs
RAPIDS cuGraph – Accelerating all your Graph needs
 
Future services on Janet
Future services on JanetFuture services on Janet
Future services on Janet
 
Managing Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic OptimizingManaging Apache Spark Workload and Automatic Optimizing
Managing Apache Spark Workload and Automatic Optimizing
 
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
Accelerating Apache Spark by Several Orders of Magnitude with GPUs and RAPIDS...
 
Accelerating Data Science With GPUs
Accelerating Data Science With GPUsAccelerating Data Science With GPUs
Accelerating Data Science With GPUs
 
ICHEC - Observation systems, technologies and big data
ICHEC - Observation systems, technologies and big dataICHEC - Observation systems, technologies and big data
ICHEC - Observation systems, technologies and big data
 
Low Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard PlatformLow Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard Platform
 
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
Big_Data_Heterogeneous_Programming IEEE_Big_Data 2015
 
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics EcosystemXDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
XDF 2019 Xilinx Accelerated Database and Data Analytics Ecosystem
 

More from VMware Tanzu

What AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItWhat AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItVMware Tanzu
 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023VMware Tanzu
 
Enhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleEnhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleVMware Tanzu
 
Spring Update | July 2023
Spring Update | July 2023Spring Update | July 2023
Spring Update | July 2023VMware Tanzu
 
Platforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductPlatforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductVMware Tanzu
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready AppsVMware Tanzu
 
Spring Boot 3 And Beyond
Spring Boot 3 And BeyondSpring Boot 3 And Beyond
Spring Boot 3 And BeyondVMware Tanzu
 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfSpring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfVMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023VMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023VMware Tanzu
 
tanzu_developer_connect.pptx
tanzu_developer_connect.pptxtanzu_developer_connect.pptx
tanzu_developer_connect.pptxVMware Tanzu
 
Tanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchTanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchVMware Tanzu
 
Tanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishTanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishVMware Tanzu
 
Virtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVirtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVMware Tanzu
 
Tanzu Developer Connect - French
Tanzu Developer Connect - FrenchTanzu Developer Connect - French
Tanzu Developer Connect - FrenchVMware Tanzu
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023VMware Tanzu
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootSpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootVMware Tanzu
 
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerSpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerVMware Tanzu
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeSpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeVMware Tanzu
 
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsSpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsVMware Tanzu
 

More from VMware Tanzu (20)

What AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About ItWhat AI Means For Your Product Strategy And What To Do About It
What AI Means For Your Product Strategy And What To Do About It
 
Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023Make the Right Thing the Obvious Thing at Cardinal Health 2023
Make the Right Thing the Obvious Thing at Cardinal Health 2023
 
Enhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at ScaleEnhancing DevEx and Simplifying Operations at Scale
Enhancing DevEx and Simplifying Operations at Scale
 
Spring Update | July 2023
Spring Update | July 2023Spring Update | July 2023
Spring Update | July 2023
 
Platforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a ProductPlatforms, Platform Engineering, & Platform as a Product
Platforms, Platform Engineering, & Platform as a Product
 
Building Cloud Ready Apps
Building Cloud Ready AppsBuilding Cloud Ready Apps
Building Cloud Ready Apps
 
Spring Boot 3 And Beyond
Spring Boot 3 And BeyondSpring Boot 3 And Beyond
Spring Boot 3 And Beyond
 
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdfSpring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
Spring Cloud Gateway - SpringOne Tour 2023 Charles Schwab.pdf
 
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
Simplify and Scale Enterprise Apps in the Cloud | Boston 2023
 
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
Simplify and Scale Enterprise Apps in the Cloud | Seattle 2023
 
tanzu_developer_connect.pptx
tanzu_developer_connect.pptxtanzu_developer_connect.pptx
tanzu_developer_connect.pptx
 
Tanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - FrenchTanzu Virtual Developer Connect Workshop - French
Tanzu Virtual Developer Connect Workshop - French
 
Tanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - EnglishTanzu Developer Connect Workshop - English
Tanzu Developer Connect Workshop - English
 
Virtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - EnglishVirtual Developer Connect Workshop - English
Virtual Developer Connect Workshop - English
 
Tanzu Developer Connect - French
Tanzu Developer Connect - FrenchTanzu Developer Connect - French
Tanzu Developer Connect - French
 
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
Simplify and Scale Enterprise Apps in the Cloud | Dallas 2023
 
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring BootSpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
SpringOne Tour: Deliver 15-Factor Applications on Kubernetes with Spring Boot
 
SpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software EngineerSpringOne Tour: The Influential Software Engineer
SpringOne Tour: The Influential Software Engineer
 
SpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs PracticeSpringOne Tour: Domain-Driven Design: Theory vs Practice
SpringOne Tour: Domain-Driven Design: Theory vs Practice
 
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense SolutionsSpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
SpringOne Tour: Spring Recipes: A Collection of Common-Sense Solutions
 

Recently uploaded

Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Hr365.us smith
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentationvaddepallysandeep122
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noidabntitsolutionsrishis
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfIdiosysTechnologies1
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprisepreethippts
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEEVICTOR MAESTRE RAMIREZ
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxTier1 app
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceBrainSell Technologies
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...OnePlan Solutions
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...OnePlan Solutions
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Natan Silnitsky
 

Recently uploaded (20)

Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)Recruitment Management Software Benefits (Infographic)
Recruitment Management Software Benefits (Infographic)
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
PREDICTING RIVER WATER QUALITY ppt presentation
PREDICTING  RIVER  WATER QUALITY  ppt presentationPREDICTING  RIVER  WATER QUALITY  ppt presentation
PREDICTING RIVER WATER QUALITY ppt presentation
 
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in NoidaBuds n Tech IT Solutions: Top-Notch Web Services in Noida
Buds n Tech IT Solutions: Top-Notch Web Services in Noida
 
Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Best Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdfBest Web Development Agency- Idiosys USA.pdf
Best Web Development Agency- Idiosys USA.pdf
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Odoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 EnterpriseOdoo 14 - eLearning Module In Odoo 14 Enterprise
Odoo 14 - eLearning Module In Odoo 14 Enterprise
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
Cloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEECloud Data Center Network Construction - IEEE
Cloud Data Center Network Construction - IEEE
 
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptxKnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
KnowAPIs-UnknownPerf-jaxMainz-2024 (1).pptx
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
CRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. SalesforceCRM Contender Series: HubSpot vs. Salesforce
CRM Contender Series: HubSpot vs. Salesforce
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
Maximizing Efficiency and Profitability with OnePlan’s Professional Service A...
 
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
Tech Tuesday - Mastering Time Management Unlock the Power of OnePlan's Timesh...
 
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
Taming Distributed Systems: Key Insights from Wix's Large-Scale Experience - ...
 

Distributing big astronomical catalogues with Greenplum - Greenplum Summit 2019

  • 1. ESA UNCLASSIFIED - For Official Use Distributing big astronomical catalogues with Greenplum Pilar de Teodoro European Space Astronomy Centre, Madrid, Spain Greenplum Summit, NYC, 19/03/2019 @pteodoro
  • 2. ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 2 Outlines • What is ESA and ESDC • Why scaling out • Tests performed • Lessons learnt and what is missing • Next steps
  • 3. ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 3 European Space Agency
  • 4. ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 5 Data Flow Mission Operations Center ESOC: Germany Science Operations Center ESAC: Spain Esac Science Data Center ESAC: Spain
  • 5. ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 6 http://archives.esac.esa.int
  • 6. ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 7 Euclid Scientific Archive System Access to scientific data online
  • 7. ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 8 The ESA Fleet for Astrophysics Data at ESDC
  • 8. ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 9 Euclid will add 45PB by 2025
  • 9. ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 10 Databases Size at ESDC (January 2019) 0 5000 10000 15000 20000 25000 XSA CSA HST "HSA" PSA ESASKY Euclid LPFSA GACS Size (GB) Size (GB)
  • 10. ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 11 Database Systems at ESDC Most archives: PostgreSQL: 9.4 to 11.1 (15 operational databases) • Open Source • Big community • Spherical queries (pg_sphere, q3c, healpix, postgis) • Professional companies support (that we do not use at our own risk) ISO: Sybase Euclid DPS: Oracle ESASky, PSA (some functionalities): ElasticSearch Euclid Prototype: Spark Import from Files, MySQL…
  • 11. ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 12 Databases HW Chosen to scale up: Largest (GACS): Dell PowerEdge R920 Memory: 1.5TB 48 cores 7 TB SSD 13 TB SAS 21TB NetApp
  • 12. ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 13 Plan for Scale-out Tests Objective: Big Datasets queried in efficient times (Requirements). Ingestion performance. Pre-Requisites: Resources, data to ingest (Catalogues, Relational tables) Comparison: ingestion time, retrieving time, administration, problems arisen.
  • 13. ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 14 PostgreSQL Scale-out Solutions Tested Citus by CitusData • Citus 7, extension installed with Postgres-9 Postgres-XL by 2nd Quadrant • Postgres-10 alpha2 (postgres 10.3)- Euclid • Postgres-XL 9.5 R1.6 - Gaia • Pivotal GreenPlum 5.9
  • 14. ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 15 Postgres-XL Architecture
  • 15. ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 16 Postgres-XL Architecture for Testing Scicloud-27 VMs
  • 16. ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 17 Citus Architecture
  • 17. ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 18 Citus Architecture for Testing • PostgreSQL-9/PostgreSQL-10 • Citus extension:7.1 Master Node Worker Node1 Worker Node2 Worker Node3 32GB RAM 32GB RAM 32GB RAM 32GB RAM Scicloud-4VMs
  • 18. ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 19 Greenplum Architecture
  • 19. ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 20 Greenplum architecture for Testing Scicloud VMs: Scicloud Bare Metal Dell Cluster Master dn0 6 dn0 1 32GB/ 8 cores 4 segments Master 256GB/ 48 cores Master Master Master Master dn01 4/8 segments Netapp SAS Drive/ Netapp Master 256GB/ 48 cores Master Master Master dn01 32/64 segments SSD
  • 20. ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 21 Simple Query Tests: Postgres-XL vs Citus Select * from kids_mb_catalogue Where mag_auto_i_trunc=XX On Kids catalogue: v 16M rows v 12GB total v Structured data v Comparison for Single instance(SI) on postgres v10 vs postgres-xl, citus running on 3 nodes and postgres-xl running on 25 nodes. Time grows with the number of rows. Filter Rows 10 2389562 11 13 12 46 13 165 14 1615 15 21932 16 66365 17 116562 18 199186 19 341422 20 600520 21 977292 22 1496713 23 2144309 24 2722471 25 2654069 26 1761394 Filter Rows 27 730583 28 250883 29 90818 30 34346 31 13285 32 5320 33 2149 34 820 35 361 36 139 37 62 38 27 39 4 40 5 41 3 42 1 43 0
  • 21. ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 22 Postgres Single Instance vs XL vs Citus (hash distribution) 0 50000 100000 150000 200000 250000 300000 2389562 13 46 165 1615 21932 66365 116562 199186 341422 600520 977292 1496713 2144309 2722471 2654069 1761394 730583 250883 90818 34346 13285 5320 2149 820 361 139 62 27 4 5 3 1 0 Timeinms Rows number postgres-xl-3nodes postgres-xl-25_nodes SI-10 citus7.1-3nodes
  • 22. ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 24 Hardware Testing for Environment (Postgres-XL or SI) Time Netapp (10 partitions) 3h 46m SSD (10 partitions) 1h 17m Postgres-XL 10 nodes (VM on physical nodes-Netapp) 2h 35min Postgres-XL 25 nodes (Netapp scicloud) 35m Netapp (FDW postgres-xl 25 nodes) 1h 40m Flagship_mock table: 1,1TB Rows#: 2,5B Select count(*) from flagship_mock;
  • 23. ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 25 Greenplum Testing • Ingestion • Expansion • Performance • High Avalaibility • Resource Management • Community edition vs Commercial edition
  • 24. ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 26 Greenplum Testing Community edition does not include: • Support from Pivotal Greenplum team • Product packaging and installation script. • Support for QuickLZ compression. QuickLZ compression is not provided in the open source version of Greenplum Database due to licensing restrictions. • Support for managing Greenplum Database using Pivotal Greenplum Command Center. • Support for full text search and text analysis using Pivotal GPText. • Support for data connectors: •Greenplum-Spark Connector •Greenplum-Informatica Connector •Greenplum-Kafka Integration •Gemfire-Greenplum Connector • Data Direct ODBC/JDBC Drivers • gpcopy utility for copying or migrating objects between Greenplum systems.
  • 25. ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 27 GP Testing: Ingestion gpfdist: nohup gpfdist -d /spv02/ -p 4000 > gpfdist_4000.log 2>&1 & Create external table flag_ext() location ('gpfdist://172.25.4.244:4000/*’) FORMAT 'CSV’; 273*10GB csv files in 3 hours (NetApp) – 6 gpfdist servers The benefit of using gpfdist is that you are guaranteed maximum parallelism while reading from or writing to external tables, thereby offering the best performance as well as easier administration of external tables. From: https://gpdb.docs.pivotal.io/520/utility_guide/admin_utilities/gpfdist.html
  • 26. ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 28 GP Testing: Expansion GPExpand: Expanding the cluster to new datanodes or same ones increasing number of segments. 10 Gb network
  • 27. ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 29 GP Testing: Performance Performance: • Gaia query profiling tests: Catalogues XMatch • Euclid query profiling tests: 1,3TB catalogue What to test: • Table formats • Scalability
  • 28. ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 30 GP testing: Xmatch Performance Most relevant queries: Gaia query SELECT gdr1.source_id AS iddr1, gdr2.source_id AS iddr2 FROM gaiadr1.gaia_source AS gdr1, gaiadr2.gaia_source AS gdr2 WHERE q3c_join(gdr2.ra, gdr2.dec, gdr1.ra, gdr1.dec, 0.5/3600) 5.2 5.3 5.4 5.5 5.6 5.7 5.8 5.9 6 6.1 GACSDB02 GP-LIM-4-64-HEAP_C GP-BM-5-40-AO
  • 29. ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 32 GP testing: Parameter Query performance SELECT q3c_dist(public.flagship_mock."ra_gal",public.flagship_mock."dec_gal",220.,0.0) AS dist , * FROM public.flagship_mock euclid WHERE public.flagship_mock."galaxy_id" = 2 AND public.flagship_mock."gr_cosmos" > 1;
  • 30. ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 33 GP Testing: Performance select count(*) form euclid.flagship_mock; Table format Time(s) GP-BM-5-40-QLZ 14 GP-6-24-QLZ 22 GP-6-24-COL 33 GP-LIM-4-32-HEAP 457 GP-LIM-4-64-HEAP 533 GP-BM-5-40-AO 581 GP-6-24-AO 1693 GP-BM-5-40-HEAP-C 1695 GP-6-24 1808 GP-2-8 8055 Postgres-XL-10nodes 9324 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 G P-B M -5-40-Q LZ G P-6-24-Q LZ G P-6-24-CO L G P-LIM -4-32-H EAP G P-LIM -4-64-H EAP G P-B M -5-40-A O G P-6-24-AO G P-B M -5-40-…G P-6-24 G P-2-8 postgres-X L-… count(*) (s)
  • 31. ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 34 GP Testing: Performance Select count(*) from gaiadr2.gaia_source: 1692919135 rows 0 200 400 600 800 1000 1200 G P-B M -5-40-Q LZ G P-LIM -4-32-H EAP G P-B M -5-40-A OG P-6-24-CO L G P-B M -5-20-A O gacsdb02 G P-6-24 count(*) (s) Table format Time(s) GP-BM-5-40-QLZ 6 GP-LIM-4-32-HEAP 7 GP-BM-5-40-AO 10 GP-6-24-COL 14 GP-BM-5-20-AO 59 gacsdb02 323 GP-6-24 980
  • 32. ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 35 GP Testing: High Availability
  • 33. ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 36 GP Testing: Managing Resources • Resource groups: set and enforce CPU, memory, and concurrent transaction limits in Greenplum Database. Use Linux Control Groups (cgroups) to manage CPU resources. • Resource queues: manage the degree of concurrency in a Greenplum Database system. •manage the number of active queries that may execute concurrently, •the amount of memory each type of query is allocated, •priority of queries.
  • 34. ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 37 GP Testing: Resource management
  • 35. ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 38 Lessons Learnt and what is missing(I) v Yes, Postgres-XL is scaling but: vstill some issues about stability: GTM memory leak, workaround GTM proxy vdoes not work with foreign data wrappers vGTM does not rotate logs vnot direct monitoring tool vReshuffling not working vJDBC bug identified by ESDC not resolved yet v Citus: vas it is an extension can make some workers to behave as single instance. vWill need to work on performance to retrieve large amount of rows. vAdministration is easier than in Postgres-XL.
  • 36. ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 39 v Greenplum is the most mature postgres-flavour solution for us. • Active developers community • GP 5 based on postgres 8.3 use “reserved words”, TAP problem, resolved in future GP6 • Going to GP 6 based on postgres 9.4 • Community edition is enough for ESDC currently • Use of existing HW • Resource management if needed • Best performance: load big catalogues in different format depending on the query and when parsing it decide which one to use using TAP. • Relational datamodels must be adapted. No FKs are enforced. Lessons Learnt and what is missing(II)
  • 37. ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 40 Near Future Tests Euclid: Postgres-XL will be replaced by GP. We will continue ingesting catalogues in this system and testing.
  • 38. ESA UNCLASSIFIED - For Official Use Pilar de Teodoro| Greenplum Summit’19 | 03/19/2019 | Slide 42 Thank you