SlideShare a Scribd company logo
1 of 58
Download to read offline
Tu Pham - CTO @ Eway
Google Cloud Next
-- Surabaya, Indonesia - 06/2019 --
End to End Business Intelligence
on Google Cloud
About Me - CTO at Eway JSC
- Google Developer Expert on Cloud
Platform
- 8 years experience on Big data and
Cloud Computing
- Open source contributor, blogger,
father
2
3
4
5
6
7
8
9
10
11
In Affiliate Marketing, We Are Partners With
- Indonesia
- Go-Jek
- Bukalapak
- Traveloka
- The World
- Lazada
- Shopee
- Aliexpress
- Adcombo
- Leadbit
12
> 5M
Transactions
in 2018
13
>100M dollar
Gross
merchandise
volume in
2018
14
15
16
When You
Have (Big)
Data
17
How Do We
Use This
Data
18
Use Case
- Reporting
- Business Analytics
- Operational Analytics
- Product Features
- System Monitoring
19
- Reporting to Partners, Advertisers,
Publishers, ...
Reporting
20
Business
Analytics
- Analyzing Growth, Users behavior,
Sign up funnels, Sign up referrals
21
Operational
Analytics
- Analyzing Root cause analysis,
Latency analysis, Error analysis
22
Operational
Analytics
- Better Threshold alerts, Security
alerts, Capacity planning
23
Product
Features
- Product Features Top Products ,
Publisher challenge, A/B Testing
24
25
Sample: End-To-End Flow For Mining
User Behavior
26
How do we
collect this
data?
27
Step 1: GC Compute Engine Instances
Collect Raw Data
- Technology: Cloud Load Balancing, Compute Engine
- Why Cloud Load Balancing:
- TCP/UDP Load Balancing
- Seamless Autoscaling
- Scalable
- Why Compute Engine:
- High-Performance
- Scalable
- Low Cost
- Fast Networking
- Custom Machine Types 28
Step 1: GC Compute Engine Instances
Collect Raw Data
29
How do we
process this
data?
30
Step 2: GC Compute Engine Instances
Convert Raw Data To Apache Parquet Files
- Technology: Compute Engine, Parquet file format
- Why Parquet:
- Self-describing, columnar storage format
- Language-independent
- High query-performance
- Spark SQL is much faster with Parquet
- High compression (up to 70%)- less disk IO
31
Step 2: GC Compute Engine Instances
Convert Raw Data To Apache Parquet Files
32
Step 2: GC Compute Engine Instances
Convert Raw Data To Apache Parquet Files
33
- Technology: Compute Engine, Parquet file format, Cloud Storage
- Why Cloud Storage:
- Four storage classes
- Easy to integrate
- Object Lifecycle Management
- Fast Networking
Step 3: GC Compute Engine Upload Parquet
File To GC Cloud Storage
34
Step 3: GC Compute Engine Upload Parquet
File To GC Cloud Storage
35
Step 3: GC Compute Engine Upload Parquet
File To GC Cloud Storage
36
Step 3: GC Compute Engine Upload Parquet
File To GC Cloud Storage
37
Step 3: GC Compute Engine Upload Parquet
File To GC Cloud Storage
38
How do we
visualize this
data
39
Step 4: Explore Dataset Using BI Tools
- Technology: DataPrep, Big Query, Grafana, PowerBI
40
Step 4: Explore Dataset Using BI Tools
- Technology: DataPrep
41
Step 4: Explore Dataset Using BI Tools
- Technology: BigQuery
42
Step 4b: Explore Dataset Using BI Tools
43
44
45
Step 4: Explore Dataset Using BI Tools
- Technology: Grafana
46
Step 4: Explore Dataset Using BI Tools
- Technology: PowerBI
47
Step 4b: Explore Dataset Using BI Tools
48
Step 5: Aggregate Data
> val df = spark.read.parquet(“/log/2017/07/user_engagement/1.snappy.parquet”)
49
> df.show()
+----------------------------+---------------------------+------------------------------+-----------------------------+---------+---------+------------------
+
| id | categoryId | topicId | userId | action | value | created |
+----------------------------+---------------------------+------------------------------+-----------------------------+---------+---------+------------------
+
|"100011125479181_..|"253397751448382"|"253397751448382_...| "100011125479181 "|"view"| "" |1490621079|
|"100004354358107_..|"253397751448382"|"253397751448382_...| "100004354358107"|"view"| "" |1490491531|
|"100014752680147_..|"253397751448382"|"253397751448382_...| "100014752680147"|"like"| "" |1490457109|
Step 5: Aggregate Data
50
> val df_group_count = df.groupBy("userId","categoryId", "action").count().show()
+--------------------------+---------------------------+----------+----------+
| userId | categoryId | action | count |
+--------------------------+---------------------------+----------+----------+
|"100011896037126"|"253397751448382"|"like" | 2 |
|"100010391178709"|"253397751448382"|"like" | 1 |
|"100011186707422"|"253397751448382"|"like" | 1 |
|"100012202096674"|"253397751448382"|"like" | 1 |
Step 5: Aggregate Data
51
Step 5: Aggregate Big Data
52
Step 5: Aggregate Big Data
Number of unique user per topic AVG User engagement per topic
53
Become
Geek
54
Where Are
The AI /
ML
55
Create
Your
Principles
Principles:
- KISS (Keep it simple, stupid)
- DRY (Don’t Repeat Yourself)
- Single Responsibility
- Low Cost
- Scalable
56
Be 1% better everyday
tips
Create your system
principles
Design system
architecture, data flow,
data model, data
structure first
Separate realtime and
batch flows
Separate data storage
strategies between data
types
Save the cost by
network cost, instances
cost, storage cost by
metric monitoring &
alert system 57
Thank You - Q&A
● Eway: https://eway.vn
● My Contact: tupp@eway.vn
58

More Related Content

What's hot

Cloud Developer Days - BigQuery
Cloud Developer Days - BigQueryCloud Developer Days - BigQuery
Cloud Developer Days - BigQueryWlodek Bielski
 
30 days of google cloud event
30 days of google cloud event30 days of google cloud event
30 days of google cloud eventPreetyKhatkar
 
Google cloud big data summit master gcp big data summit la - 10-20-2015
Google cloud big data summit   master gcp big data summit la - 10-20-2015Google cloud big data summit   master gcp big data summit la - 10-20-2015
Google cloud big data summit master gcp big data summit la - 10-20-2015Raj Babu
 
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016Chris Jang
 
Google Cloud Platform (GCP)
Google Cloud Platform (GCP)Google Cloud Platform (GCP)
Google Cloud Platform (GCP)Chetan Sharma
 
Visualising and Linking Open Data from Multiple Sources
Visualising and Linking Open Data from Multiple SourcesVisualising and Linking Open Data from Multiple Sources
Visualising and Linking Open Data from Multiple SourcesData Driven Innovation
 
An overview of BigQuery
An overview of BigQuery An overview of BigQuery
An overview of BigQuery GirdhareeSaran
 
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming AnalyticsR, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming AnalyticsKai Wähner
 
Big Data Best Practices on GCP
Big Data Best Practices on GCPBig Data Best Practices on GCP
Big Data Best Practices on GCPAllCloud
 
StackEngine Demo - Docker Austin
StackEngine Demo - Docker AustinStackEngine Demo - Docker Austin
StackEngine Demo - Docker AustinBoyd Hemphill
 
Critical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and AnalyticsCritical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and AnalyticsData Driven Innovation
 
#DataUnlimited - Google Big Data Unlimited
#DataUnlimited - Google Big Data Unlimited#DataUnlimited - Google Big Data Unlimited
#DataUnlimited - Google Big Data UnlimitedAudrey Huvet
 
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...Imam Raza
 
Google Cloud Platform Introduction - 2016Q3
Google Cloud Platform Introduction - 2016Q3Google Cloud Platform Introduction - 2016Q3
Google Cloud Platform Introduction - 2016Q3Simon Su
 
Google BigQuery - Features & Benefits
Google BigQuery - Features & BenefitsGoogle BigQuery - Features & Benefits
Google BigQuery - Features & BenefitsAndreas Raible
 
Containerizing the Cloud with Kubernetes and Docker
Containerizing the Cloud with Kubernetes and DockerContainerizing the Cloud with Kubernetes and Docker
Containerizing the Cloud with Kubernetes and DockerJames Chittenden
 
Google Cloud Platform: Prototype ->Production-> Planet scale
Google Cloud Platform: Prototype ->Production-> Planet scaleGoogle Cloud Platform: Prototype ->Production-> Planet scale
Google Cloud Platform: Prototype ->Production-> Planet scaleIdan Tohami
 

What's hot (20)

Google and big query
Google and big queryGoogle and big query
Google and big query
 
Cloud Developer Days - BigQuery
Cloud Developer Days - BigQueryCloud Developer Days - BigQuery
Cloud Developer Days - BigQuery
 
30 days of google cloud event
30 days of google cloud event30 days of google cloud event
30 days of google cloud event
 
Google cloud big data summit master gcp big data summit la - 10-20-2015
Google cloud big data summit   master gcp big data summit la - 10-20-2015Google cloud big data summit   master gcp big data summit la - 10-20-2015
Google cloud big data summit master gcp big data summit la - 10-20-2015
 
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
Google Cloud Platform & rockPlace Big Data Event-Mar.31.2016
 
Google Cloud Platform (GCP)
Google Cloud Platform (GCP)Google Cloud Platform (GCP)
Google Cloud Platform (GCP)
 
Visualising and Linking Open Data from Multiple Sources
Visualising and Linking Open Data from Multiple SourcesVisualising and Linking Open Data from Multiple Sources
Visualising and Linking Open Data from Multiple Sources
 
An overview of BigQuery
An overview of BigQuery An overview of BigQuery
An overview of BigQuery
 
Google Bigtable
Google BigtableGoogle Bigtable
Google Bigtable
 
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming AnalyticsR, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
R, Spark, Tensorflow, H20.ai Applied to Streaming Analytics
 
Big Data Best Practices on GCP
Big Data Best Practices on GCPBig Data Best Practices on GCP
Big Data Best Practices on GCP
 
StackEngine Demo - Docker Austin
StackEngine Demo - Docker AustinStackEngine Demo - Docker Austin
StackEngine Demo - Docker Austin
 
Critical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and AnalyticsCritical Breakthroughs and Challenges in Big Data and Analytics
Critical Breakthroughs and Challenges in Big Data and Analytics
 
#DataUnlimited - Google Big Data Unlimited
#DataUnlimited - Google Big Data Unlimited#DataUnlimited - Google Big Data Unlimited
#DataUnlimited - Google Big Data Unlimited
 
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
Big Data with hadoop, Spark and BigQuery (Google cloud next Extended 2017 Kar...
 
Google Cloud Platform Introduction - 2016Q3
Google Cloud Platform Introduction - 2016Q3Google Cloud Platform Introduction - 2016Q3
Google Cloud Platform Introduction - 2016Q3
 
IoT at Google Scale
IoT at Google ScaleIoT at Google Scale
IoT at Google Scale
 
Google BigQuery - Features & Benefits
Google BigQuery - Features & BenefitsGoogle BigQuery - Features & Benefits
Google BigQuery - Features & Benefits
 
Containerizing the Cloud with Kubernetes and Docker
Containerizing the Cloud with Kubernetes and DockerContainerizing the Cloud with Kubernetes and Docker
Containerizing the Cloud with Kubernetes and Docker
 
Google Cloud Platform: Prototype ->Production-> Planet scale
Google Cloud Platform: Prototype ->Production-> Planet scaleGoogle Cloud Platform: Prototype ->Production-> Planet scale
Google Cloud Platform: Prototype ->Production-> Planet scale
 

Similar to End To End Business Intelligence On Google Cloud

Gimel at Dataworks Summit San Jose 2018
Gimel at Dataworks Summit San Jose 2018Gimel at Dataworks Summit San Jose 2018
Gimel at Dataworks Summit San Jose 2018Romit Mehta
 
Dataworks | 2018-06-20 | Gimel data platform
Dataworks | 2018-06-20 | Gimel data platformDataworks | 2018-06-20 | Gimel data platform
Dataworks | 2018-06-20 | Gimel data platformDeepak Chandramouli
 
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryCodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryMárton Kodok
 
GCP Online Training | GCP Data Engineer Online Course
GCP Online Training | GCP Data Engineer Online CourseGCP Online Training | GCP Data Engineer Online Course
GCP Online Training | GCP Data Engineer Online CourseJayanthvisualpath
 
Voxxed Days Cluj - Powering interactive data analysis with Google BigQuery
Voxxed Days Cluj - Powering interactive data analysis with Google BigQueryVoxxed Days Cluj - Powering interactive data analysis with Google BigQuery
Voxxed Days Cluj - Powering interactive data analysis with Google BigQueryMárton Kodok
 
How to design and implement a data ops architecture with sdc and gcp
How to design and implement a data ops architecture with sdc and gcpHow to design and implement a data ops architecture with sdc and gcp
How to design and implement a data ops architecture with sdc and gcpJoseph Arriola
 
Google Analytics location data visualised with CARTO & BigQuery
Google Analytics location data visualised with CARTO & BigQueryGoogle Analytics location data visualised with CARTO & BigQuery
Google Analytics location data visualised with CARTO & BigQueryCARTO
 
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...DataWorks Summit
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and EngineeringVijayananda Mohire
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and EngineeringVijayananda Mohire
 
QCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformQCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformDeepak Chandramouli
 
Survey Add-on Showcase: Cloud Transformation
Survey Add-on Showcase: Cloud TransformationSurvey Add-on Showcase: Cloud Transformation
Survey Add-on Showcase: Cloud TransformationLeanIX GmbH
 
Making advanced analytics accessible to more companies
Making advanced analytics accessible to more companiesMaking advanced analytics accessible to more companies
Making advanced analytics accessible to more companiesMárton Kodok
 
5 Benefits of Predictive Analytics for E-Commerce
5 Benefits of Predictive Analytics for E-Commerce5 Benefits of Predictive Analytics for E-Commerce
5 Benefits of Predictive Analytics for E-CommerceEdureka!
 
London atlassian meetup 31 jan 2016 jira metrics-extract slides
London atlassian meetup 31 jan 2016 jira metrics-extract slidesLondon atlassian meetup 31 jan 2016 jira metrics-extract slides
London atlassian meetup 31 jan 2016 jira metrics-extract slidesRudiger Wolf
 
Dynniq & GoDataDriven - Shaping the future of traffic with IoT and AI
Dynniq & GoDataDriven - Shaping the future of traffic with IoT and AIDynniq & GoDataDriven - Shaping the future of traffic with IoT and AI
Dynniq & GoDataDriven - Shaping the future of traffic with IoT and AIBigDataExpo
 
Speeding Up Data Science: From a Data Management Perspective
Speeding Up Data Science: From a Data Management PerspectiveSpeeding Up Data Science: From a Data Management Perspective
Speeding Up Data Science: From a Data Management PerspectiveJiannan Wang
 
Building cost-effective mobile product & marketing app analytics based on GCP...
Building cost-effective mobile product & marketing app analytics based on GCP...Building cost-effective mobile product & marketing app analytics based on GCP...
Building cost-effective mobile product & marketing app analytics based on GCP...GameCamp
 
Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTX
Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTXCustomer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTX
Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTXtsigitnist02
 

Similar to End To End Business Intelligence On Google Cloud (20)

Gimel at Dataworks Summit San Jose 2018
Gimel at Dataworks Summit San Jose 2018Gimel at Dataworks Summit San Jose 2018
Gimel at Dataworks Summit San Jose 2018
 
Dataworks | 2018-06-20 | Gimel data platform
Dataworks | 2018-06-20 | Gimel data platformDataworks | 2018-06-20 | Gimel data platform
Dataworks | 2018-06-20 | Gimel data platform
 
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQueryCodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
CodeCamp Iasi - Creating serverless data analytics system on GCP using BigQuery
 
GCP Online Training | GCP Data Engineer Online Course
GCP Online Training | GCP Data Engineer Online CourseGCP Online Training | GCP Data Engineer Online Course
GCP Online Training | GCP Data Engineer Online Course
 
Voxxed Days Cluj - Powering interactive data analysis with Google BigQuery
Voxxed Days Cluj - Powering interactive data analysis with Google BigQueryVoxxed Days Cluj - Powering interactive data analysis with Google BigQuery
Voxxed Days Cluj - Powering interactive data analysis with Google BigQuery
 
How to design and implement a data ops architecture with sdc and gcp
How to design and implement a data ops architecture with sdc and gcpHow to design and implement a data ops architecture with sdc and gcp
How to design and implement a data ops architecture with sdc and gcp
 
Google Analytics location data visualised with CARTO & BigQuery
Google Analytics location data visualised with CARTO & BigQueryGoogle Analytics location data visualised with CARTO & BigQuery
Google Analytics location data visualised with CARTO & BigQuery
 
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
Data Acquisition Automation for NiFi in a Hybrid Cloud environment – the Path...
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and Engineering
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and Engineering
 
QCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic PlatformQCon 2018 | Gimel | PayPal's Analytic Platform
QCon 2018 | Gimel | PayPal's Analytic Platform
 
Survey Add-on Showcase: Cloud Transformation
Survey Add-on Showcase: Cloud TransformationSurvey Add-on Showcase: Cloud Transformation
Survey Add-on Showcase: Cloud Transformation
 
Making advanced analytics accessible to more companies
Making advanced analytics accessible to more companiesMaking advanced analytics accessible to more companies
Making advanced analytics accessible to more companies
 
5 Benefits of Predictive Analytics for E-Commerce
5 Benefits of Predictive Analytics for E-Commerce5 Benefits of Predictive Analytics for E-Commerce
5 Benefits of Predictive Analytics for E-Commerce
 
London atlassian meetup 31 jan 2016 jira metrics-extract slides
London atlassian meetup 31 jan 2016 jira metrics-extract slidesLondon atlassian meetup 31 jan 2016 jira metrics-extract slides
London atlassian meetup 31 jan 2016 jira metrics-extract slides
 
Dynniq & GoDataDriven - Shaping the future of traffic with IoT and AI
Dynniq & GoDataDriven - Shaping the future of traffic with IoT and AIDynniq & GoDataDriven - Shaping the future of traffic with IoT and AI
Dynniq & GoDataDriven - Shaping the future of traffic with IoT and AI
 
Modern Thinking área digital MSKM 21/09/2017
Modern Thinking área digital MSKM 21/09/2017Modern Thinking área digital MSKM 21/09/2017
Modern Thinking área digital MSKM 21/09/2017
 
Speeding Up Data Science: From a Data Management Perspective
Speeding Up Data Science: From a Data Management PerspectiveSpeeding Up Data Science: From a Data Management Perspective
Speeding Up Data Science: From a Data Management Perspective
 
Building cost-effective mobile product & marketing app analytics based on GCP...
Building cost-effective mobile product & marketing app analytics based on GCP...Building cost-effective mobile product & marketing app analytics based on GCP...
Building cost-effective mobile product & marketing app analytics based on GCP...
 
Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTX
Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTXCustomer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTX
Customer Presentation - IBM Cloud Pak for Data Overview (Level 100).PPTX
 

More from Tu Pham

Go from idea to app with no coding using AppSheet.pptx
Go from idea to app with no coding using AppSheet.pptxGo from idea to app with no coding using AppSheet.pptx
Go from idea to app with no coding using AppSheet.pptxTu Pham
 
Secure your app against DDOS, API Abuse, Hijacking, and Fraud
 Secure your app against DDOS, API Abuse, Hijacking, and Fraud Secure your app against DDOS, API Abuse, Hijacking, and Fraud
Secure your app against DDOS, API Abuse, Hijacking, and FraudTu Pham
 
Challenges In Implementing SRE
Challenges In Implementing SREChallenges In Implementing SRE
Challenges In Implementing SRETu Pham
 
IT Strategy
IT Strategy IT Strategy
IT Strategy Tu Pham
 
Set up Learn and Development program
Set up Learn and Development programSet up Learn and Development program
Set up Learn and Development programTu Pham
 
Cost Management For IT Project / Product
Cost Management For IT Project / ProductCost Management For IT Project / Product
Cost Management For IT Project / ProductTu Pham
 
Minimum Viable Product 101
Minimum Viable Product 101Minimum Viable Product 101
Minimum Viable Product 101Tu Pham
 
Understand your customers
Understand your customersUnderstand your customers
Understand your customersTu Pham
 
Let's build great products for mid-size companies
Let's build great products for mid-size companiesLet's build great products for mid-size companies
Let's build great products for mid-size companiesTu Pham
 
Latency Control And Supervision In Resilience Design Patterns
Latency Control And Supervision In Resilience Design Patterns Latency Control And Supervision In Resilience Design Patterns
Latency Control And Supervision In Resilience Design Patterns Tu Pham
 
High Output Tech Management
High Output Tech Management High Output Tech Management
High Output Tech Management Tu Pham
 
Security On The Cloud
Security On The CloudSecurity On The Cloud
Security On The CloudTu Pham
 
Eway Tech Talk #2 Coding Guidelines
Eway Tech Talk #2 Coding GuidelinesEway Tech Talk #2 Coding Guidelines
Eway Tech Talk #2 Coding GuidelinesTu Pham
 
Eway Tech Talk #0 Knowledge Sharing
Eway Tech Talk #0 Knowledge SharingEway Tech Talk #0 Knowledge Sharing
Eway Tech Talk #0 Knowledge SharingTu Pham
 
Php 5.6 vs Php 7 performance comparison
Php 5.6 vs Php 7 performance comparisonPhp 5.6 vs Php 7 performance comparison
Php 5.6 vs Php 7 performance comparisonTu Pham
 
System Security on Cloud
System Security on CloudSystem Security on Cloud
System Security on CloudTu Pham
 
Big Data at DYNO
Big Data at DYNOBig Data at DYNO
Big Data at DYNOTu Pham
 
Understanding Kubernetes
Understanding KubernetesUnderstanding Kubernetes
Understanding KubernetesTu Pham
 
Building Reactive Applications With Akka And Java
Building Reactive Applications With Akka And JavaBuilding Reactive Applications With Akka And Java
Building Reactive Applications With Akka And JavaTu Pham
 
Database, data storage, hosting with Firebase
Database, data storage, hosting with FirebaseDatabase, data storage, hosting with Firebase
Database, data storage, hosting with FirebaseTu Pham
 

More from Tu Pham (20)

Go from idea to app with no coding using AppSheet.pptx
Go from idea to app with no coding using AppSheet.pptxGo from idea to app with no coding using AppSheet.pptx
Go from idea to app with no coding using AppSheet.pptx
 
Secure your app against DDOS, API Abuse, Hijacking, and Fraud
 Secure your app against DDOS, API Abuse, Hijacking, and Fraud Secure your app against DDOS, API Abuse, Hijacking, and Fraud
Secure your app against DDOS, API Abuse, Hijacking, and Fraud
 
Challenges In Implementing SRE
Challenges In Implementing SREChallenges In Implementing SRE
Challenges In Implementing SRE
 
IT Strategy
IT Strategy IT Strategy
IT Strategy
 
Set up Learn and Development program
Set up Learn and Development programSet up Learn and Development program
Set up Learn and Development program
 
Cost Management For IT Project / Product
Cost Management For IT Project / ProductCost Management For IT Project / Product
Cost Management For IT Project / Product
 
Minimum Viable Product 101
Minimum Viable Product 101Minimum Viable Product 101
Minimum Viable Product 101
 
Understand your customers
Understand your customersUnderstand your customers
Understand your customers
 
Let's build great products for mid-size companies
Let's build great products for mid-size companiesLet's build great products for mid-size companies
Let's build great products for mid-size companies
 
Latency Control And Supervision In Resilience Design Patterns
Latency Control And Supervision In Resilience Design Patterns Latency Control And Supervision In Resilience Design Patterns
Latency Control And Supervision In Resilience Design Patterns
 
High Output Tech Management
High Output Tech Management High Output Tech Management
High Output Tech Management
 
Security On The Cloud
Security On The CloudSecurity On The Cloud
Security On The Cloud
 
Eway Tech Talk #2 Coding Guidelines
Eway Tech Talk #2 Coding GuidelinesEway Tech Talk #2 Coding Guidelines
Eway Tech Talk #2 Coding Guidelines
 
Eway Tech Talk #0 Knowledge Sharing
Eway Tech Talk #0 Knowledge SharingEway Tech Talk #0 Knowledge Sharing
Eway Tech Talk #0 Knowledge Sharing
 
Php 5.6 vs Php 7 performance comparison
Php 5.6 vs Php 7 performance comparisonPhp 5.6 vs Php 7 performance comparison
Php 5.6 vs Php 7 performance comparison
 
System Security on Cloud
System Security on CloudSystem Security on Cloud
System Security on Cloud
 
Big Data at DYNO
Big Data at DYNOBig Data at DYNO
Big Data at DYNO
 
Understanding Kubernetes
Understanding KubernetesUnderstanding Kubernetes
Understanding Kubernetes
 
Building Reactive Applications With Akka And Java
Building Reactive Applications With Akka And JavaBuilding Reactive Applications With Akka And Java
Building Reactive Applications With Akka And Java
 
Database, data storage, hosting with Firebase
Database, data storage, hosting with FirebaseDatabase, data storage, hosting with Firebase
Database, data storage, hosting with Firebase
 

Recently uploaded

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 

Recently uploaded (20)

08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 

End To End Business Intelligence On Google Cloud

  • 1. Tu Pham - CTO @ Eway Google Cloud Next -- Surabaya, Indonesia - 06/2019 -- End to End Business Intelligence on Google Cloud
  • 2. About Me - CTO at Eway JSC - Google Developer Expert on Cloud Platform - 8 years experience on Big data and Cloud Computing - Open source contributor, blogger, father 2
  • 3. 3
  • 4. 4
  • 5. 5
  • 6. 6
  • 7. 7
  • 8. 8
  • 9. 9
  • 10. 10
  • 11. 11
  • 12. In Affiliate Marketing, We Are Partners With - Indonesia - Go-Jek - Bukalapak - Traveloka - The World - Lazada - Shopee - Aliexpress - Adcombo - Leadbit 12
  • 15. 15
  • 16. 16
  • 18. How Do We Use This Data 18
  • 19. Use Case - Reporting - Business Analytics - Operational Analytics - Product Features - System Monitoring 19
  • 20. - Reporting to Partners, Advertisers, Publishers, ... Reporting 20
  • 21. Business Analytics - Analyzing Growth, Users behavior, Sign up funnels, Sign up referrals 21
  • 22. Operational Analytics - Analyzing Root cause analysis, Latency analysis, Error analysis 22
  • 23. Operational Analytics - Better Threshold alerts, Security alerts, Capacity planning 23
  • 24. Product Features - Product Features Top Products , Publisher challenge, A/B Testing 24
  • 25. 25
  • 26. Sample: End-To-End Flow For Mining User Behavior 26
  • 27. How do we collect this data? 27
  • 28. Step 1: GC Compute Engine Instances Collect Raw Data - Technology: Cloud Load Balancing, Compute Engine - Why Cloud Load Balancing: - TCP/UDP Load Balancing - Seamless Autoscaling - Scalable - Why Compute Engine: - High-Performance - Scalable - Low Cost - Fast Networking - Custom Machine Types 28
  • 29. Step 1: GC Compute Engine Instances Collect Raw Data 29
  • 30. How do we process this data? 30
  • 31. Step 2: GC Compute Engine Instances Convert Raw Data To Apache Parquet Files - Technology: Compute Engine, Parquet file format - Why Parquet: - Self-describing, columnar storage format - Language-independent - High query-performance - Spark SQL is much faster with Parquet - High compression (up to 70%)- less disk IO 31
  • 32. Step 2: GC Compute Engine Instances Convert Raw Data To Apache Parquet Files 32
  • 33. Step 2: GC Compute Engine Instances Convert Raw Data To Apache Parquet Files 33
  • 34. - Technology: Compute Engine, Parquet file format, Cloud Storage - Why Cloud Storage: - Four storage classes - Easy to integrate - Object Lifecycle Management - Fast Networking Step 3: GC Compute Engine Upload Parquet File To GC Cloud Storage 34
  • 35. Step 3: GC Compute Engine Upload Parquet File To GC Cloud Storage 35
  • 36. Step 3: GC Compute Engine Upload Parquet File To GC Cloud Storage 36
  • 37. Step 3: GC Compute Engine Upload Parquet File To GC Cloud Storage 37
  • 38. Step 3: GC Compute Engine Upload Parquet File To GC Cloud Storage 38
  • 39. How do we visualize this data 39
  • 40. Step 4: Explore Dataset Using BI Tools - Technology: DataPrep, Big Query, Grafana, PowerBI 40
  • 41. Step 4: Explore Dataset Using BI Tools - Technology: DataPrep 41
  • 42. Step 4: Explore Dataset Using BI Tools - Technology: BigQuery 42
  • 43. Step 4b: Explore Dataset Using BI Tools 43
  • 44. 44
  • 45. 45
  • 46. Step 4: Explore Dataset Using BI Tools - Technology: Grafana 46
  • 47. Step 4: Explore Dataset Using BI Tools - Technology: PowerBI 47
  • 48. Step 4b: Explore Dataset Using BI Tools 48
  • 49. Step 5: Aggregate Data > val df = spark.read.parquet(“/log/2017/07/user_engagement/1.snappy.parquet”) 49
  • 50. > df.show() +----------------------------+---------------------------+------------------------------+-----------------------------+---------+---------+------------------ + | id | categoryId | topicId | userId | action | value | created | +----------------------------+---------------------------+------------------------------+-----------------------------+---------+---------+------------------ + |"100011125479181_..|"253397751448382"|"253397751448382_...| "100011125479181 "|"view"| "" |1490621079| |"100004354358107_..|"253397751448382"|"253397751448382_...| "100004354358107"|"view"| "" |1490491531| |"100014752680147_..|"253397751448382"|"253397751448382_...| "100014752680147"|"like"| "" |1490457109| Step 5: Aggregate Data 50
  • 51. > val df_group_count = df.groupBy("userId","categoryId", "action").count().show() +--------------------------+---------------------------+----------+----------+ | userId | categoryId | action | count | +--------------------------+---------------------------+----------+----------+ |"100011896037126"|"253397751448382"|"like" | 2 | |"100010391178709"|"253397751448382"|"like" | 1 | |"100011186707422"|"253397751448382"|"like" | 1 | |"100012202096674"|"253397751448382"|"like" | 1 | Step 5: Aggregate Data 51
  • 52. Step 5: Aggregate Big Data 52
  • 53. Step 5: Aggregate Big Data Number of unique user per topic AVG User engagement per topic 53
  • 55. Where Are The AI / ML 55
  • 56. Create Your Principles Principles: - KISS (Keep it simple, stupid) - DRY (Don’t Repeat Yourself) - Single Responsibility - Low Cost - Scalable 56
  • 57. Be 1% better everyday tips Create your system principles Design system architecture, data flow, data model, data structure first Separate realtime and batch flows Separate data storage strategies between data types Save the cost by network cost, instances cost, storage cost by metric monitoring & alert system 57
  • 58. Thank You - Q&A ● Eway: https://eway.vn ● My Contact: tupp@eway.vn 58