SlideShare a Scribd company logo
1 of 48
Download to read offline
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Olivier Klein
Senior Solutions Architect
June 2016
Big Data Architectural Patterns
and Best Practices on AWS
Three Types of Data Analytics
Retrospective
analysis and
reporting
Here-and-now
real-time processing
and dashboards
Predictions
to enable smart
apps
Ingest Store Process Visualize
Data Answers
Time
Simplified Big Data Pipeline
Amazon S3
Amazon
DynamoDB
Amazon RDS
Ingest Store Process Visualize
Amazon Mobile
Analytics
Amazon EMR
Amazon Redshift
Amazon
Lambda
Amazon Kinesis
Firehose
Amazon Machine
Learning
Amazon
EC2
Amazon
Glacier
Amazon
Elasticsearch
Service
Amazon
Kinesis
Analytics
Amazon QuickSight
AWS Import/
Export Snowball
Amazon
Kinesis
Amazon S3
Amazon
DynamoDB
Amazon RDS
Ingest Store Process Visualize
Amazon Mobile
Analytics
Amazon EMR
Amazon Redshift
Amazon
Lambda
Amazon Kinesis
Firehose
Amazon Machine
Learning
Amazon
EC2
Amazon
Glacier
Amazon
Elasticsearch
Service
Amazon
Kinesis
Analytics
Amazon QuickSight
AWS Import/
Export Snowball
Amazon
Kinesis
Fluentd: Open Source Log Collection
•  Fluentd is an open source
data collector to unify data
collection and consumption
•  Integration into many data
sources (App Logs, Syslogs,
Twitter etc.)
•  Direct integration into AWS
such as S3 & Kinesis
<source>
type tail
format apache2
path /var/log/apache2/access_log
tag s3.apache.access
</source>
<match s3.*.*>
type s3
s3_bucket myweblogs
path logs/
</match>
Amazon S3
Amazon
DynamoDB
Amazon RDS
Ingest Store Process Visualize
Amazon Mobile
Analytics
Amazon EMR
Amazon Redshift
Amazon
Lambda
Amazon Kinesis
Firehose
Amazon Machine
Learning
Amazon
EC2
Amazon
Glacier
Amazon
Elasticsearch
Service
Amazon
Kinesis
Analytics
Amazon QuickSight
AWS Import/
Export Snowball
Amazon
Kinesis
Amazon S3
•  Highly available object storage designed
for 99.999999999% data durability
•  Replicated across 3 facilities
•  Virtually unlimited scale
•  Pay only for what you use, you don’t
need to pre-provision
•  Allows event notifications to trigger
further action
•  Ideal for a data lake (single source of truth)
Amazon S3
Amazon S3
Amazon
DynamoDB
Amazon RDS
Ingest Store Process Visualize
Amazon Mobile
Analytics
Amazon EMR
Amazon Redshift
Amazon
Lambda
Amazon Kinesis
Firehose
Amazon Machine
Learning
Amazon
EC2
Amazon
Glacier
Amazon
Elasticsearch
Service
Amazon
Kinesis
Analytics
Amazon QuickSight
AWS Import/
Export Snowball
Amazon
Kinesis
Amazon DynamoDB
•  Schemaless Data Model
•  Seamless scalability
•  No storage or throughput limits
•  Consistent low latency performance
•  High durability and availability
•  Replicated across 3 facilities
DynamoDB
table
items
a*ributes
Fully Managed NoSQL Database Service
500,000 writes / second to their Amazon
DynamoDB tables
Amazon S3
Amazon
DynamoDB
Amazon RDS
Ingest Store Process Visualize
Amazon Mobile
Analytics
Amazon EMR
Amazon Redshift
Amazon
Lambda
Amazon Kinesis
Firehose
Amazon Machine
Learning
Amazon
EC2
Amazon
Glacier
Amazon
Elasticsearch
Service
Amazon
Kinesis
Analytics
Amazon QuickSight
AWS Import/
Export Snowball
Amazon
Kinesis
Stream in Real Time: Amazon Kinesis
•  Real-Time Data Processing over
large distributed streams
•  Elastic capacity that scales to
millions of events per second
•  React In real-time upon incoming
stream events
•  Reliable stream storage
replicated across 3 facilities
Amazon Kinesis
Kinesis
for Real-
Time
Amazon S3
Amazon
DynamoDB
Amazon RDS
Ingest Store Process Visualize
Amazon Mobile
Analytics
Amazon EMR
Amazon Redshift
Amazon
Lambda
Amazon Kinesis
Firehose
Amazon Machine
Learning
Amazon
EC2
Amazon
Glacier
Amazon
Kinesis
Analytics
Amazon QuickSight
AWS Import/
Export Snowball
Amazon
Kinesis
Amazon
Elasticsearch
Service
Amazon EMR
•  Amazon EMR is a fully managed
Hadoop cluster
•  Transient and long running clusters
•  Direct integration into Amazon S3
and Amazon Kinesis
•  Easy to scale and enable burstable
capacity
•  Integration with AWS Spot Market
1 instance x 100 hours = 100 instances x 1 hour
(and with Spot Pricing not only faster but also cheaper)
Process – Amazon EMR
•  Amazon EMR supports all common
Hadoop Frameworks such as:
•  Spark, Pig, Hive, Hue, Oozie …
•  Hbase, Presto, Impala …
•  Decouples storage from compute
•  Allows independent scaling
•  Direct Integration with DynamoDB
and S3 (EMRFS)
Amazon S3
Amazon
DynamoDB
Amazon EMR
•  FINRA regulates trading practices of
brokerage firms and exchange markets to
protect market integrity
•  Market surveillance platform stores
30 billion market events every day
•  Leverages Amazon S3 to store events
and allow analysts to interactively query
market dynamics using Amazon EMR
Hive & HBase clusters with increased
agility
Re-Architecting Compliance
Unlimited
Storage
Distributed
Computing
Interactive Market
Queries
Ensure
compliance
30 billion market
events
Apache Spark
•  Apache Spark is an in-memory
analytics cluster using RDD (Resilient
Distributed Dataset) for fast processing
•  Faster than Map-Reduce due to
removal of shuffling phases to HDFS
•  Apache Spark Streaming can read
directly from DynamoDB, S3 and a
Kinesis stream
Processing Amazon Kinesis streams
Amazon	
	Kinesis	
EMR	with		
Spark	Streaming	
KinesisUtils.createStream(‘twitter-stream’)
.filter(_.getText.contains(‘Big Data’))
.countByWindow(Seconds(5))
Counting tweets on a sliding window
Amazon S3
Amazon
DynamoDB
Amazon RDS
Ingest Store Process Visualize
Amazon Mobile
Analytics
Amazon EMR
Amazon Redshift
Amazon
Lambda
Amazon Kinesis
Firehose
Amazon Machine
Learning
Amazon
EC2
Amazon
Glacier
Amazon
Kinesis
Analytics
Amazon QuickSight
AWS Import/
Export Snowball
Amazon
Kinesis
Amazon
Elasticsearch
Service
Amazon Redshift
•  Fully managed petabyte-scale data
warehouse
•  Scalable amount of cluster nodes
•  ODBC/JDBC connector for BI tools
using SQL
•  Supports Amazon DynamoDB and
Amazon S3 to load data
•  Less than a 10th of a cost of traditional
solutions
Amazon Redshift
Amazon S3
Amazon
DynamoDB
Amazon RDS
Ingest Store Process Visualize
Amazon Mobile
Analytics
Amazon EMR
Amazon Redshift
Amazon Kinesis
Firehose
Amazon Machine
Learning
Amazon
EC2
Amazon
Glacier
Amazon
Kinesis
Analytics
AWS Import/
Export Snowball
Amazon
Kinesis
Amazon
Lambda
Amazon
Elasticsearch
Service
Amazon QuickSight
AWS Marketplace
•  Pre-Configured machine images
ready to be launched into virtual
server instances
•  Launch applications with 1-Click
•  Pay software licenses by the
hour or bring your own license
(BYOL)
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Fu Ting Chan, Founder
InfoForce.co
June 2016
Customer Sharing: Scalable Big Data
Intelligence with Machine Learning
InfoCluster
InfoForce is a cloud based BIG DATA solution provider specializing in Near Real-Time High Throughput
Analytics and Machine Learning algorithms
Application Server via REST
Organization
Data Lakes
InfoForce.co
Data Pipes Analytics
Machine
Learning
e-commerce
Marketplace
~4mil
product
> 1,800
sellers
Historical
purchase
Log files
Relational
DBs
TB of
product
catalog
1. Price Optimizer
2. Cross-sell bundler
3. Recommendation

Engine
We reviewed 1,800+ merchant, total 3.5 million SKU, transaction volume US $80 mil in last 90 days and
found that over 86% listings don’t have any sales performance in last 60 days.*
2% 3%
3%1%
5%
86%
Hi-exposure Mid-hi exposure Mid-low exposure
New listed 7 days New listed 30 days Low exposure
Condition %
Hi Exposure 1.9%
Mi-hi Exposure 3.4%
Mid-Low Exposure 2.8%
New listed in 7 days 0.6%
New listed in 30 days 5%
Low Exposure 86.3%
Total 100%
Definition!
!
1.  High exposure - sold in last 7 days!
2.  Mid-high exposure - sold in last 30 days but no in last 7 days!
3.  Mid-low exposure - sold in last 60 days but no in last 7 days!
4.  Low exposure - No sold history in last 60 days!
Understand the e-commerce market
Instead of setting fixed selling price, Price optimiser offer DYNAMIC Pricing strategy. Based on product sales
cycle and compare similar product sales cycle in market to design price rules for each merchandise.
6050 Strategies
Product

Similarity
Automatically Managed Listings
1. Price Optimization
To discover product‘s relationship, InfoForce based on textual analysis engine derives relationship from
catalog meta-data and machine learn from consumption patterns.
Flora By Gucci Eau De Parfum Spray 50ml
gucci, flora, eau_de_parfum_spray, 50ml
Brand Model Product
Gucci, eau_de_toilette
Gucci, flora
eau_de_parfum
75ml
Vera_Wang, floral
eau_de_parfum_spray
50ml
2. Cross-sell Bundler
2. Machine Learning
gucci, flora,
eau_de_parfum
_spray, 50ml
gucci, eau_de_toilette
gucci, flora
eau_de_parfum
75ml
vera_wang, floral
eau_de_parfum_spray
50ml
Derived Tags
gucci -> salvatore_ferragamo; fendi; prada; hermes; cartier
flora -> floral,flower, pageant; flowers,wedding
eau_de_parfum ->eau_de_toilette; colonia; perfume; eau_de_parfum; deodorant;
3. Recommendation Engine
InfoCluster Architecture
Scalability
TBs of Data
Compute & IO heavy
Dynamically grow
Timeliness
Streaming data
Results needs to be
available FAST
Always available
Secure
Privacy
Fine grain control on
resources
Persistence & Backup
Challenges
Challenge No More
EC2
Solid platform to scale our compute
capabilities. Leveraging AWS IAM
services to grant API level access to spin
up additional machines when necessary.
Kinesis Streaming
Core to our architecture are the real-time data
pipes which is built on top of AWS Kinesis
where we can provision shards number based
on throughput requirements without having to
worry about the complexities of setting up a
production grade streaming service
DynamoDB
Serve as a reliable way to persist over
important meta data such as check
pointing and stream schema info
Kinesis Firehose
Firehose is used to archive any data
received in from the data pipes for long
term batch and real-time analytics by the
calculation cluster
S3 and CloudFront
S3 is used as a way to persist data, it
also has handy integrations into Apache
Spark for distributed easy distributed
computing. CloudFront is used as a CDN
for fast raw data delivery into apps
CloudWatch
InfoCluster uses built-in and custom
CloudWatch metrics to store and monitor
services, the useful alarm functionality
notifies the operation team of any issues
Scalability Timeliness Secure
AWS Services Usage (1)
Amazon EC2
Solid platform to scale our compute capabilities.
Leveraging AWS IAM services to grant API level
access to spin up additional machines when
necessary.
Amazon Kinesis Firehose
Firehose is used to archive any data received in
from the data pipes for long term batch and real-
time analytics by the calculation cluster
AWS Services Usage (2)
Amazon Kinesis
Core to our architecture are the real-time data pipes which are
built on top of Amazon Kinesis where we can provision shards
number based on throughput requirements without having to
worry about the complexities of setting up a production grade
streaming service.
Amazon S3 and Amazon CloudFront
S3 is used as a way to persist data, it also has handy integrations
into Apache Spark for easy distributed computing. CloudFront is
used as a CDN for fast raw data delivery into apps
AWS Services Usage (3)
Amazon DynamoDB
Serve as a reliable way to persist over important meta
data such as check pointing and stream schema info
Amazon CloudWatch
InfoCluster uses built-in and custom CloudWatch metrics
to store and monitor services. The useful alarm
functionality notifies the operation team of any issues.
Price Optimizer
>700,000 items actively
managed
Generated > $1.4mil USD
revenue for merchants,
counting 51,014 sold pcs.
Cross-Sell Bundler
Successful integration with word
tagging infrastructure
Soft launch early Jun 2016.
Generate independent bundling result
in 30 sec for 2,000 items with 10
bundle recommendation each.
Winner of HK ICT Awards 2016
Best Startup
Best Smart Hong Kong (Big
Data Application)
Results
Merchant provide 50K listings to adopt our price optimizer, which 2% of it (around 1,000 listings) were re-
activated to sales and generate over US $80K GMV in 20 days.
Successful Case
•  Business
•  Look for application of technology in other domain / industries
•  Plug-in / API Level access to InfoCluster’s functionalities
•  Further investment into machine learning
•  Technology
•  Storage optimization using ST1 and SC1 EBS
•  Optimization in balance between continuous resource vs compute
services such as lambda
•  Full integration of custom processes with AWS CloudWatch
•  PubSub notification framework
Future Plan
•  We have progressed so far beyond justifying cloud beyond cost...
•  Builders like to build
•  Unlock your organization data lakes, use InfoForce (preferably !)
Summary
“We see our customers as
invited guests to a party,
and we are the hosts. It’s
our job to make every
important aspect of the
customer experience
a little bit better.”
Jeff Bezos
CEO, Amazon.com
Data analysis for a better customer experience
•  Your business creates and stores
data and logs all the time
•  Data points and logs allow you to
understand individual customer
experience and improve it
•  Analysis of logs and trails help
gain insights
Big Data: 

•  Massive Datasets
•  Experimental style of data
manipulation and analysis
•  Not a steady-state workload;
peaks and valleys
•  Combination of structured and
unstructured data in many
formats
AWS Cloud:

•  Virtually unlimited capacity
•  Experimental usage cost through
on-demand infrastructure
•  Scalable infrastructure for highly
variable workloads
•  Tools & Services for managing
structured, unstructured and
stream data
Thank you!
Olivier Klein
Senior Solutions Architect
Fu Ting Chan, Founder
InfoForce.co

More Related Content

What's hot

AWS re:Invent 2016: Automating Cloud Management and Deployment for a Diverse ...
AWS re:Invent 2016: Automating Cloud Management and Deployment for a Diverse ...AWS re:Invent 2016: Automating Cloud Management and Deployment for a Diverse ...
AWS re:Invent 2016: Automating Cloud Management and Deployment for a Diverse ...Amazon Web Services
 
Introduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web ServicesIntroduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web ServicesAmazon Web Services
 
Intro Presentation at AWS AWSome Day Dublin July 2015
Intro Presentation at AWS AWSome Day Dublin July 2015Intro Presentation at AWS AWSome Day Dublin July 2015
Intro Presentation at AWS AWSome Day Dublin July 2015Ian Massingham
 
Introduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web ServicesIntroduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web ServicesAmazon Web Services
 
Build a Website on AWS for Your First 10 Million Users
Build a Website on AWS for Your First 10 Million UsersBuild a Website on AWS for Your First 10 Million Users
Build a Website on AWS for Your First 10 Million UsersAmazon Web Services
 
Welcome Keynote - AWS Summit Stockholm
Welcome Keynote - AWS Summit Stockholm Welcome Keynote - AWS Summit Stockholm
Welcome Keynote - AWS Summit Stockholm Amazon Web Services
 
Track 4 Session 1_MAD01 如何活用事件驅動架構快速擴展應用
Track 4 Session 1_MAD01 如何活用事件驅動架構快速擴展應用Track 4 Session 1_MAD01 如何活用事件驅動架構快速擴展應用
Track 4 Session 1_MAD01 如何活用事件驅動架構快速擴展應用Amazon Web Services
 
AWS Cloud School Introductory Presentation
AWS Cloud School Introductory PresentationAWS Cloud School Introductory Presentation
AWS Cloud School Introductory PresentationIan Massingham
 
AWS re:Invent 2016 Recap in Hong Kong Keynote
AWS re:Invent 2016 Recap in Hong Kong KeynoteAWS re:Invent 2016 Recap in Hong Kong Keynote
AWS re:Invent 2016 Recap in Hong Kong KeynoteAmazon Web Services
 
Compliance in the Cloud Using “Security by Design” Principles
Compliance in the Cloud Using “Security by Design” PrinciplesCompliance in the Cloud Using “Security by Design” Principles
Compliance in the Cloud Using “Security by Design” PrinciplesAmazon Web Services
 
Transformation Track AWS Cloud Experience Argentina - Why Enterprise Workload...
Transformation Track AWS Cloud Experience Argentina - Why Enterprise Workload...Transformation Track AWS Cloud Experience Argentina - Why Enterprise Workload...
Transformation Track AWS Cloud Experience Argentina - Why Enterprise Workload...Amazon Web Services LATAM
 
Introduction to Cloud Computing with Amazon Web Services-ASEAN Workshop Serie...
Introduction to Cloud Computing with Amazon Web Services-ASEAN Workshop Serie...Introduction to Cloud Computing with Amazon Web Services-ASEAN Workshop Serie...
Introduction to Cloud Computing with Amazon Web Services-ASEAN Workshop Serie...Amazon Web Services
 
Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢
Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢
Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢Amazon Web Services
 
Introduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web ServicesIntroduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web ServicesAmazon Web Services
 
Keeping Developers and Auditors Happy in the Cloud
Keeping Developers and Auditors Happy in the CloudKeeping Developers and Auditors Happy in the Cloud
Keeping Developers and Auditors Happy in the CloudAmazon Web Services
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesAmazon Web Services
 
Introduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web Services Introduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web Services Amazon Web Services
 
5 Years Of Building SaaS On AWS
5 Years Of Building SaaS On AWS5 Years Of Building SaaS On AWS
5 Years Of Building SaaS On AWSChristian Beedgen
 

What's hot (20)

AWS re:Invent 2016: Automating Cloud Management and Deployment for a Diverse ...
AWS re:Invent 2016: Automating Cloud Management and Deployment for a Diverse ...AWS re:Invent 2016: Automating Cloud Management and Deployment for a Diverse ...
AWS re:Invent 2016: Automating Cloud Management and Deployment for a Diverse ...
 
Digital Workloads on AWS
Digital Workloads on AWSDigital Workloads on AWS
Digital Workloads on AWS
 
Introduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web ServicesIntroduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web Services
 
Intro Presentation at AWS AWSome Day Dublin July 2015
Intro Presentation at AWS AWSome Day Dublin July 2015Intro Presentation at AWS AWSome Day Dublin July 2015
Intro Presentation at AWS AWSome Day Dublin July 2015
 
Introduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web ServicesIntroduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web Services
 
Build a Website on AWS for Your First 10 Million Users
Build a Website on AWS for Your First 10 Million UsersBuild a Website on AWS for Your First 10 Million Users
Build a Website on AWS for Your First 10 Million Users
 
Welcome Keynote - AWS Summit Stockholm
Welcome Keynote - AWS Summit Stockholm Welcome Keynote - AWS Summit Stockholm
Welcome Keynote - AWS Summit Stockholm
 
Track 4 Session 1_MAD01 如何活用事件驅動架構快速擴展應用
Track 4 Session 1_MAD01 如何活用事件驅動架構快速擴展應用Track 4 Session 1_MAD01 如何活用事件驅動架構快速擴展應用
Track 4 Session 1_MAD01 如何活用事件驅動架構快速擴展應用
 
AWS Cloud School Introductory Presentation
AWS Cloud School Introductory PresentationAWS Cloud School Introductory Presentation
AWS Cloud School Introductory Presentation
 
AWS re:Invent 2016 Recap in Hong Kong Keynote
AWS re:Invent 2016 Recap in Hong Kong KeynoteAWS re:Invent 2016 Recap in Hong Kong Keynote
AWS re:Invent 2016 Recap in Hong Kong Keynote
 
Compliance in the Cloud Using “Security by Design” Principles
Compliance in the Cloud Using “Security by Design” PrinciplesCompliance in the Cloud Using “Security by Design” Principles
Compliance in the Cloud Using “Security by Design” Principles
 
Transformation Track AWS Cloud Experience Argentina - Why Enterprise Workload...
Transformation Track AWS Cloud Experience Argentina - Why Enterprise Workload...Transformation Track AWS Cloud Experience Argentina - Why Enterprise Workload...
Transformation Track AWS Cloud Experience Argentina - Why Enterprise Workload...
 
Introduction to Cloud Computing with Amazon Web Services-ASEAN Workshop Serie...
Introduction to Cloud Computing with Amazon Web Services-ASEAN Workshop Serie...Introduction to Cloud Computing with Amazon Web Services-ASEAN Workshop Serie...
Introduction to Cloud Computing with Amazon Web Services-ASEAN Workshop Serie...
 
AWSome Day Brussels - Keynote
AWSome Day Brussels - Keynote AWSome Day Brussels - Keynote
AWSome Day Brussels - Keynote
 
Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢
Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢
Track 3 Session 6_打造應用專屬資料庫 (Purpose-built) 與了解託管服務優勢
 
Introduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web ServicesIntroduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web Services
 
Keeping Developers and Auditors Happy in the Cloud
Keeping Developers and Auditors Happy in the CloudKeeping Developers and Auditors Happy in the Cloud
Keeping Developers and Auditors Happy in the Cloud
 
Modern Data Architectures for Business Outcomes
Modern Data Architectures for Business OutcomesModern Data Architectures for Business Outcomes
Modern Data Architectures for Business Outcomes
 
Introduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web Services Introduction to Cloud Computing with Amazon Web Services
Introduction to Cloud Computing with Amazon Web Services
 
5 Years Of Building SaaS On AWS
5 Years Of Building SaaS On AWS5 Years Of Building SaaS On AWS
5 Years Of Building SaaS On AWS
 

Viewers also liked

(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWSAmazon Web Services
 
The AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewThe AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewAmazon Web Services
 
Big Data y el sector salud
Big Data y el sector saludBig Data y el sector salud
Big Data y el sector saludBEEVA_es
 
Building Highly Scalable Web Applications
Building Highly Scalable Web ApplicationsBuilding Highly Scalable Web Applications
Building Highly Scalable Web ApplicationsIWMW
 
Ignite eCommerce growth with AWS
Ignite eCommerce growth with AWSIgnite eCommerce growth with AWS
Ignite eCommerce growth with AWSAmazon Web Services
 
Big Data and High Performance Computing Solutions in the AWS Cloud
Big Data and High Performance Computing Solutions in the AWS CloudBig Data and High Performance Computing Solutions in the AWS Cloud
Big Data and High Performance Computing Solutions in the AWS CloudAmazon Web Services
 
Scaling Galaxy on Google Cloud Platform
Scaling Galaxy on Google Cloud PlatformScaling Galaxy on Google Cloud Platform
Scaling Galaxy on Google Cloud PlatformLynn Langit
 
(ARC301) Scaling Up to Your First 10 Million Users
(ARC301) Scaling Up to Your First 10 Million Users(ARC301) Scaling Up to Your First 10 Million Users
(ARC301) Scaling Up to Your First 10 Million UsersAmazon Web Services
 
AWS January 2016 Webinar Series - Getting Started with Big Data on AWS
AWS January 2016 Webinar Series - Getting Started with Big Data on AWSAWS January 2016 Webinar Series - Getting Started with Big Data on AWS
AWS January 2016 Webinar Series - Getting Started with Big Data on AWSAmazon Web Services
 
Deep Dive: Scaling Up to Your First 10 Million Users
Deep Dive: Scaling Up to Your First 10 Million UsersDeep Dive: Scaling Up to Your First 10 Million Users
Deep Dive: Scaling Up to Your First 10 Million UsersAmazon Web Services
 
Google Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline PatternsGoogle Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline PatternsLynn Langit
 
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSAmazon Web Services
 
Big Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS CloudBig Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS CloudAmazon Web Services
 
AWS re:Invent 2016: Workshop: Building Your First Big Data Application with A...
AWS re:Invent 2016: Workshop: Building Your First Big Data Application with A...AWS re:Invent 2016: Workshop: Building Your First Big Data Application with A...
AWS re:Invent 2016: Workshop: Building Your First Big Data Application with A...Amazon Web Services
 

Viewers also liked (20)

(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS(BDT310) Big Data Architectural Patterns and Best Practices on AWS
(BDT310) Big Data Architectural Patterns and Best Practices on AWS
 
The AWS Big Data Platform – Overview
The AWS Big Data Platform – OverviewThe AWS Big Data Platform – Overview
The AWS Big Data Platform – Overview
 
Big Data y el sector salud
Big Data y el sector saludBig Data y el sector salud
Big Data y el sector salud
 
AWS Big Data Platform
AWS Big Data PlatformAWS Big Data Platform
AWS Big Data Platform
 
Big Data Analytics on AWS
Big Data Analytics on AWSBig Data Analytics on AWS
Big Data Analytics on AWS
 
Building Highly Scalable Web Applications
Building Highly Scalable Web ApplicationsBuilding Highly Scalable Web Applications
Building Highly Scalable Web Applications
 
Ignite eCommerce growth with AWS
Ignite eCommerce growth with AWSIgnite eCommerce growth with AWS
Ignite eCommerce growth with AWS
 
Big Data and High Performance Computing Solutions in the AWS Cloud
Big Data and High Performance Computing Solutions in the AWS CloudBig Data and High Performance Computing Solutions in the AWS Cloud
Big Data and High Performance Computing Solutions in the AWS Cloud
 
Scaling Galaxy on Google Cloud Platform
Scaling Galaxy on Google Cloud PlatformScaling Galaxy on Google Cloud Platform
Scaling Galaxy on Google Cloud Platform
 
(ARC301) Scaling Up to Your First 10 Million Users
(ARC301) Scaling Up to Your First 10 Million Users(ARC301) Scaling Up to Your First 10 Million Users
(ARC301) Scaling Up to Your First 10 Million Users
 
AWS January 2016 Webinar Series - Getting Started with Big Data on AWS
AWS January 2016 Webinar Series - Getting Started with Big Data on AWSAWS January 2016 Webinar Series - Getting Started with Big Data on AWS
AWS January 2016 Webinar Series - Getting Started with Big Data on AWS
 
Deep Dive: Scaling Up to Your First 10 Million Users
Deep Dive: Scaling Up to Your First 10 Million UsersDeep Dive: Scaling Up to Your First 10 Million Users
Deep Dive: Scaling Up to Your First 10 Million Users
 
Google Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline PatternsGoogle Cloud and Data Pipeline Patterns
Google Cloud and Data Pipeline Patterns
 
Apache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWSApache Spark and the Hadoop Ecosystem on AWS
Apache Spark and the Hadoop Ecosystem on AWS
 
Big Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS CloudBig Data Use Cases and Solutions in the AWS Cloud
Big Data Use Cases and Solutions in the AWS Cloud
 
AWS Business Essentials
AWS Business EssentialsAWS Business Essentials
AWS Business Essentials
 
Big Data Architectural Patterns
Big Data Architectural PatternsBig Data Architectural Patterns
Big Data Architectural Patterns
 
Deep Dive on AWS IoT
Deep Dive on AWS IoTDeep Dive on AWS IoT
Deep Dive on AWS IoT
 
2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days2016 AWS Big Data Solution Days
2016 AWS Big Data Solution Days
 
AWS re:Invent 2016: Workshop: Building Your First Big Data Application with A...
AWS re:Invent 2016: Workshop: Building Your First Big Data Application with A...AWS re:Invent 2016: Workshop: Building Your First Big Data Application with A...
AWS re:Invent 2016: Workshop: Building Your First Big Data Application with A...
 

Similar to Big Data Architectural Patterns and Best Practices on AWS

Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...Amazon Web Services
 
Real-time Analytics with Open-Source
Real-time Analytics with Open-SourceReal-time Analytics with Open-Source
Real-time Analytics with Open-SourceAmazon Web Services
 
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016Amazon Web Services
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...Amazon Web Services
 
在 Amazon Web Services 實現大數據應用-電子商務的案例分享
在 Amazon Web Services 實現大數據應用-電子商務的案例分享在 Amazon Web Services 實現大數據應用-電子商務的案例分享
在 Amazon Web Services 實現大數據應用-電子商務的案例分享Amazon Web Services
 
20141021 AWS Cloud Taekwon - Big Data on AWS
20141021 AWS Cloud Taekwon - Big Data on AWS20141021 AWS Cloud Taekwon - Big Data on AWS
20141021 AWS Cloud Taekwon - Big Data on AWSAmazon Web Services Korea
 
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...Amazon Web Services
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Amazon Web Services
 
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsAmazon Web Services
 
B3 - Business intelligence apps on aws
B3 - Business intelligence apps on awsB3 - Business intelligence apps on aws
B3 - Business intelligence apps on awsAmazon Web Services
 
Getting Started with Amazon Kinesis
Getting Started with Amazon KinesisGetting Started with Amazon Kinesis
Getting Started with Amazon KinesisAmazon Web Services
 
Best Practices for Distributed Machine Learning and Predictive Analytics Usin...
Best Practices for Distributed Machine Learning and Predictive Analytics Usin...Best Practices for Distributed Machine Learning and Predictive Analytics Usin...
Best Practices for Distributed Machine Learning and Predictive Analytics Usin...Amazon Web Services
 
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWSAWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWSAmazon Web Services
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Amazon Web Services
 
Easy Analytics with AWS - AWS Summit Bahrain 2017
Easy Analytics with AWS - AWS Summit Bahrain 2017Easy Analytics with AWS - AWS Summit Bahrain 2017
Easy Analytics with AWS - AWS Summit Bahrain 2017Amazon Web Services
 
SRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDB
SRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDBSRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDB
SRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDBAmazon Web Services
 
Building Serverless Applications with Amazon DynamoDB & AWS Lambda - Workshop...
Building Serverless Applications with Amazon DynamoDB & AWS Lambda - Workshop...Building Serverless Applications with Amazon DynamoDB & AWS Lambda - Workshop...
Building Serverless Applications with Amazon DynamoDB & AWS Lambda - Workshop...Amazon Web Services
 
AWS Customer Presentation - Angelbeat Princeton Seminar
AWS Customer Presentation -  Angelbeat Princeton SeminarAWS Customer Presentation -  Angelbeat Princeton Seminar
AWS Customer Presentation - Angelbeat Princeton SeminarAmazon Web Services
 
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017Amazon Web Services
 

Similar to Big Data Architectural Patterns and Best Practices on AWS (20)

Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...
 
Real-time Analytics with Open-Source
Real-time Analytics with Open-SourceReal-time Analytics with Open-Source
Real-time Analytics with Open-Source
 
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016
Getting Started with Amazon Kinesis | AWS Public Sector Summit 2016
 
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
AWS April 2016 Webinar Series - Getting Started with Real-Time Data Analytics...
 
在 Amazon Web Services 實現大數據應用-電子商務的案例分享
在 Amazon Web Services 實現大數據應用-電子商務的案例分享在 Amazon Web Services 實現大數據應用-電子商務的案例分享
在 Amazon Web Services 實現大數據應用-電子商務的案例分享
 
20141021 AWS Cloud Taekwon - Big Data on AWS
20141021 AWS Cloud Taekwon - Big Data on AWS20141021 AWS Cloud Taekwon - Big Data on AWS
20141021 AWS Cloud Taekwon - Big Data on AWS
 
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
AWS Summit 2013 | Singapore - Big Data Analytics, Presented by AWS, Intel and...
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...
 
Deep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming ApplicationsDeep Dive and Best Practices for Real Time Streaming Applications
Deep Dive and Best Practices for Real Time Streaming Applications
 
B3 - Business intelligence apps on aws
B3 - Business intelligence apps on awsB3 - Business intelligence apps on aws
B3 - Business intelligence apps on aws
 
Getting Started with Amazon Kinesis
Getting Started with Amazon KinesisGetting Started with Amazon Kinesis
Getting Started with Amazon Kinesis
 
Best Practices for Distributed Machine Learning and Predictive Analytics Usin...
Best Practices for Distributed Machine Learning and Predictive Analytics Usin...Best Practices for Distributed Machine Learning and Predictive Analytics Usin...
Best Practices for Distributed Machine Learning and Predictive Analytics Usin...
 
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWSAWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
AWS Summit Stockholm 2014 – B4 – Business intelligence on AWS
 
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
Building Data Lakes and Analytics on AWS; Patterns and Best Practices - BDA30...
 
Easy Analytics with AWS - AWS Summit Bahrain 2017
Easy Analytics with AWS - AWS Summit Bahrain 2017Easy Analytics with AWS - AWS Summit Bahrain 2017
Easy Analytics with AWS - AWS Summit Bahrain 2017
 
SRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDB
SRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDBSRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDB
SRV301-Optimizing Serverless Application Data Tiers with Amazon DynamoDB
 
Building Serverless Applications with Amazon DynamoDB & AWS Lambda - Workshop...
Building Serverless Applications with Amazon DynamoDB & AWS Lambda - Workshop...Building Serverless Applications with Amazon DynamoDB & AWS Lambda - Workshop...
Building Serverless Applications with Amazon DynamoDB & AWS Lambda - Workshop...
 
AWS Customer Presentation - Angelbeat Princeton Seminar
AWS Customer Presentation -  Angelbeat Princeton SeminarAWS Customer Presentation -  Angelbeat Princeton Seminar
AWS Customer Presentation - Angelbeat Princeton Seminar
 
Big Data on AWS
Big Data on AWSBig Data on AWS
Big Data on AWS
 
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017Driving Business Insights with a Modern Data Architecture  AWS Summit SG 2017
Driving Business Insights with a Modern Data Architecture AWS Summit SG 2017
 

More from Amazon Web Services

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Amazon Web Services
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Amazon Web Services
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateAmazon Web Services
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSAmazon Web Services
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Amazon Web Services
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Amazon Web Services
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...Amazon Web Services
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsAmazon Web Services
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareAmazon Web Services
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSAmazon Web Services
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAmazon Web Services
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareAmazon Web Services
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWSAmazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckAmazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without serversAmazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...Amazon Web Services
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceAmazon Web Services
 

More from Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Recently uploaded

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 

Recently uploaded (20)

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 

Big Data Architectural Patterns and Best Practices on AWS

  • 1. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Olivier Klein Senior Solutions Architect June 2016 Big Data Architectural Patterns and Best Practices on AWS
  • 2. Three Types of Data Analytics Retrospective analysis and reporting Here-and-now real-time processing and dashboards Predictions to enable smart apps
  • 3. Ingest Store Process Visualize Data Answers Time Simplified Big Data Pipeline
  • 4. Amazon S3 Amazon DynamoDB Amazon RDS Ingest Store Process Visualize Amazon Mobile Analytics Amazon EMR Amazon Redshift Amazon Lambda Amazon Kinesis Firehose Amazon Machine Learning Amazon EC2 Amazon Glacier Amazon Elasticsearch Service Amazon Kinesis Analytics Amazon QuickSight AWS Import/ Export Snowball Amazon Kinesis
  • 5. Amazon S3 Amazon DynamoDB Amazon RDS Ingest Store Process Visualize Amazon Mobile Analytics Amazon EMR Amazon Redshift Amazon Lambda Amazon Kinesis Firehose Amazon Machine Learning Amazon EC2 Amazon Glacier Amazon Elasticsearch Service Amazon Kinesis Analytics Amazon QuickSight AWS Import/ Export Snowball Amazon Kinesis
  • 6. Fluentd: Open Source Log Collection •  Fluentd is an open source data collector to unify data collection and consumption •  Integration into many data sources (App Logs, Syslogs, Twitter etc.) •  Direct integration into AWS such as S3 & Kinesis <source> type tail format apache2 path /var/log/apache2/access_log tag s3.apache.access </source> <match s3.*.*> type s3 s3_bucket myweblogs path logs/ </match>
  • 7. Amazon S3 Amazon DynamoDB Amazon RDS Ingest Store Process Visualize Amazon Mobile Analytics Amazon EMR Amazon Redshift Amazon Lambda Amazon Kinesis Firehose Amazon Machine Learning Amazon EC2 Amazon Glacier Amazon Elasticsearch Service Amazon Kinesis Analytics Amazon QuickSight AWS Import/ Export Snowball Amazon Kinesis
  • 8. Amazon S3 •  Highly available object storage designed for 99.999999999% data durability •  Replicated across 3 facilities •  Virtually unlimited scale •  Pay only for what you use, you don’t need to pre-provision •  Allows event notifications to trigger further action •  Ideal for a data lake (single source of truth) Amazon S3
  • 9. Amazon S3 Amazon DynamoDB Amazon RDS Ingest Store Process Visualize Amazon Mobile Analytics Amazon EMR Amazon Redshift Amazon Lambda Amazon Kinesis Firehose Amazon Machine Learning Amazon EC2 Amazon Glacier Amazon Elasticsearch Service Amazon Kinesis Analytics Amazon QuickSight AWS Import/ Export Snowball Amazon Kinesis
  • 10. Amazon DynamoDB •  Schemaless Data Model •  Seamless scalability •  No storage or throughput limits •  Consistent low latency performance •  High durability and availability •  Replicated across 3 facilities DynamoDB table items a*ributes Fully Managed NoSQL Database Service
  • 11.
  • 12. 500,000 writes / second to their Amazon DynamoDB tables
  • 13. Amazon S3 Amazon DynamoDB Amazon RDS Ingest Store Process Visualize Amazon Mobile Analytics Amazon EMR Amazon Redshift Amazon Lambda Amazon Kinesis Firehose Amazon Machine Learning Amazon EC2 Amazon Glacier Amazon Elasticsearch Service Amazon Kinesis Analytics Amazon QuickSight AWS Import/ Export Snowball Amazon Kinesis
  • 14. Stream in Real Time: Amazon Kinesis •  Real-Time Data Processing over large distributed streams •  Elastic capacity that scales to millions of events per second •  React In real-time upon incoming stream events •  Reliable stream storage replicated across 3 facilities Amazon Kinesis
  • 16. Amazon S3 Amazon DynamoDB Amazon RDS Ingest Store Process Visualize Amazon Mobile Analytics Amazon EMR Amazon Redshift Amazon Lambda Amazon Kinesis Firehose Amazon Machine Learning Amazon EC2 Amazon Glacier Amazon Kinesis Analytics Amazon QuickSight AWS Import/ Export Snowball Amazon Kinesis Amazon Elasticsearch Service
  • 17. Amazon EMR •  Amazon EMR is a fully managed Hadoop cluster •  Transient and long running clusters •  Direct integration into Amazon S3 and Amazon Kinesis •  Easy to scale and enable burstable capacity •  Integration with AWS Spot Market
  • 18. 1 instance x 100 hours = 100 instances x 1 hour (and with Spot Pricing not only faster but also cheaper)
  • 19. Process – Amazon EMR •  Amazon EMR supports all common Hadoop Frameworks such as: •  Spark, Pig, Hive, Hue, Oozie … •  Hbase, Presto, Impala … •  Decouples storage from compute •  Allows independent scaling •  Direct Integration with DynamoDB and S3 (EMRFS) Amazon S3 Amazon DynamoDB Amazon EMR
  • 20. •  FINRA regulates trading practices of brokerage firms and exchange markets to protect market integrity •  Market surveillance platform stores 30 billion market events every day •  Leverages Amazon S3 to store events and allow analysts to interactively query market dynamics using Amazon EMR Hive & HBase clusters with increased agility Re-Architecting Compliance Unlimited Storage Distributed Computing Interactive Market Queries Ensure compliance 30 billion market events
  • 21. Apache Spark •  Apache Spark is an in-memory analytics cluster using RDD (Resilient Distributed Dataset) for fast processing •  Faster than Map-Reduce due to removal of shuffling phases to HDFS •  Apache Spark Streaming can read directly from DynamoDB, S3 and a Kinesis stream
  • 22. Processing Amazon Kinesis streams Amazon Kinesis EMR with Spark Streaming KinesisUtils.createStream(‘twitter-stream’) .filter(_.getText.contains(‘Big Data’)) .countByWindow(Seconds(5)) Counting tweets on a sliding window
  • 23. Amazon S3 Amazon DynamoDB Amazon RDS Ingest Store Process Visualize Amazon Mobile Analytics Amazon EMR Amazon Redshift Amazon Lambda Amazon Kinesis Firehose Amazon Machine Learning Amazon EC2 Amazon Glacier Amazon Kinesis Analytics Amazon QuickSight AWS Import/ Export Snowball Amazon Kinesis Amazon Elasticsearch Service
  • 24. Amazon Redshift •  Fully managed petabyte-scale data warehouse •  Scalable amount of cluster nodes •  ODBC/JDBC connector for BI tools using SQL •  Supports Amazon DynamoDB and Amazon S3 to load data •  Less than a 10th of a cost of traditional solutions Amazon Redshift
  • 25. Amazon S3 Amazon DynamoDB Amazon RDS Ingest Store Process Visualize Amazon Mobile Analytics Amazon EMR Amazon Redshift Amazon Kinesis Firehose Amazon Machine Learning Amazon EC2 Amazon Glacier Amazon Kinesis Analytics AWS Import/ Export Snowball Amazon Kinesis Amazon Lambda Amazon Elasticsearch Service Amazon QuickSight
  • 26. AWS Marketplace •  Pre-Configured machine images ready to be launched into virtual server instances •  Launch applications with 1-Click •  Pay software licenses by the hour or bring your own license (BYOL)
  • 27. © 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Fu Ting Chan, Founder InfoForce.co June 2016 Customer Sharing: Scalable Big Data Intelligence with Machine Learning
  • 28. InfoCluster InfoForce is a cloud based BIG DATA solution provider specializing in Near Real-Time High Throughput Analytics and Machine Learning algorithms Application Server via REST Organization Data Lakes InfoForce.co Data Pipes Analytics Machine Learning
  • 29. e-commerce Marketplace ~4mil product > 1,800 sellers Historical purchase Log files Relational DBs TB of product catalog 1. Price Optimizer 2. Cross-sell bundler 3. Recommendation
 Engine
  • 30. We reviewed 1,800+ merchant, total 3.5 million SKU, transaction volume US $80 mil in last 90 days and found that over 86% listings don’t have any sales performance in last 60 days.* 2% 3% 3%1% 5% 86% Hi-exposure Mid-hi exposure Mid-low exposure New listed 7 days New listed 30 days Low exposure Condition % Hi Exposure 1.9% Mi-hi Exposure 3.4% Mid-Low Exposure 2.8% New listed in 7 days 0.6% New listed in 30 days 5% Low Exposure 86.3% Total 100% Definition! ! 1.  High exposure - sold in last 7 days! 2.  Mid-high exposure - sold in last 30 days but no in last 7 days! 3.  Mid-low exposure - sold in last 60 days but no in last 7 days! 4.  Low exposure - No sold history in last 60 days! Understand the e-commerce market
  • 31. Instead of setting fixed selling price, Price optimiser offer DYNAMIC Pricing strategy. Based on product sales cycle and compare similar product sales cycle in market to design price rules for each merchandise. 6050 Strategies Product
 Similarity Automatically Managed Listings 1. Price Optimization
  • 32. To discover product‘s relationship, InfoForce based on textual analysis engine derives relationship from catalog meta-data and machine learn from consumption patterns. Flora By Gucci Eau De Parfum Spray 50ml gucci, flora, eau_de_parfum_spray, 50ml Brand Model Product Gucci, eau_de_toilette Gucci, flora eau_de_parfum 75ml Vera_Wang, floral eau_de_parfum_spray 50ml 2. Cross-sell Bundler
  • 33. 2. Machine Learning gucci, flora, eau_de_parfum _spray, 50ml gucci, eau_de_toilette gucci, flora eau_de_parfum 75ml vera_wang, floral eau_de_parfum_spray 50ml Derived Tags gucci -> salvatore_ferragamo; fendi; prada; hermes; cartier flora -> floral,flower, pageant; flowers,wedding eau_de_parfum ->eau_de_toilette; colonia; perfume; eau_de_parfum; deodorant;
  • 36. Scalability TBs of Data Compute & IO heavy Dynamically grow Timeliness Streaming data Results needs to be available FAST Always available Secure Privacy Fine grain control on resources Persistence & Backup Challenges
  • 37. Challenge No More EC2 Solid platform to scale our compute capabilities. Leveraging AWS IAM services to grant API level access to spin up additional machines when necessary. Kinesis Streaming Core to our architecture are the real-time data pipes which is built on top of AWS Kinesis where we can provision shards number based on throughput requirements without having to worry about the complexities of setting up a production grade streaming service DynamoDB Serve as a reliable way to persist over important meta data such as check pointing and stream schema info Kinesis Firehose Firehose is used to archive any data received in from the data pipes for long term batch and real-time analytics by the calculation cluster S3 and CloudFront S3 is used as a way to persist data, it also has handy integrations into Apache Spark for distributed easy distributed computing. CloudFront is used as a CDN for fast raw data delivery into apps CloudWatch InfoCluster uses built-in and custom CloudWatch metrics to store and monitor services, the useful alarm functionality notifies the operation team of any issues Scalability Timeliness Secure
  • 38. AWS Services Usage (1) Amazon EC2 Solid platform to scale our compute capabilities. Leveraging AWS IAM services to grant API level access to spin up additional machines when necessary. Amazon Kinesis Firehose Firehose is used to archive any data received in from the data pipes for long term batch and real- time analytics by the calculation cluster
  • 39. AWS Services Usage (2) Amazon Kinesis Core to our architecture are the real-time data pipes which are built on top of Amazon Kinesis where we can provision shards number based on throughput requirements without having to worry about the complexities of setting up a production grade streaming service. Amazon S3 and Amazon CloudFront S3 is used as a way to persist data, it also has handy integrations into Apache Spark for easy distributed computing. CloudFront is used as a CDN for fast raw data delivery into apps
  • 40. AWS Services Usage (3) Amazon DynamoDB Serve as a reliable way to persist over important meta data such as check pointing and stream schema info Amazon CloudWatch InfoCluster uses built-in and custom CloudWatch metrics to store and monitor services. The useful alarm functionality notifies the operation team of any issues.
  • 41. Price Optimizer >700,000 items actively managed Generated > $1.4mil USD revenue for merchants, counting 51,014 sold pcs. Cross-Sell Bundler Successful integration with word tagging infrastructure Soft launch early Jun 2016. Generate independent bundling result in 30 sec for 2,000 items with 10 bundle recommendation each. Winner of HK ICT Awards 2016 Best Startup Best Smart Hong Kong (Big Data Application) Results
  • 42. Merchant provide 50K listings to adopt our price optimizer, which 2% of it (around 1,000 listings) were re- activated to sales and generate over US $80K GMV in 20 days. Successful Case
  • 43. •  Business •  Look for application of technology in other domain / industries •  Plug-in / API Level access to InfoCluster’s functionalities •  Further investment into machine learning •  Technology •  Storage optimization using ST1 and SC1 EBS •  Optimization in balance between continuous resource vs compute services such as lambda •  Full integration of custom processes with AWS CloudWatch •  PubSub notification framework Future Plan
  • 44. •  We have progressed so far beyond justifying cloud beyond cost... •  Builders like to build •  Unlock your organization data lakes, use InfoForce (preferably !) Summary
  • 45. “We see our customers as invited guests to a party, and we are the hosts. It’s our job to make every important aspect of the customer experience a little bit better.” Jeff Bezos CEO, Amazon.com
  • 46. Data analysis for a better customer experience •  Your business creates and stores data and logs all the time •  Data points and logs allow you to understand individual customer experience and improve it •  Analysis of logs and trails help gain insights
  • 47. Big Data: •  Massive Datasets •  Experimental style of data manipulation and analysis •  Not a steady-state workload; peaks and valleys •  Combination of structured and unstructured data in many formats AWS Cloud: •  Virtually unlimited capacity •  Experimental usage cost through on-demand infrastructure •  Scalable infrastructure for highly variable workloads •  Tools & Services for managing structured, unstructured and stream data
  • 48. Thank you! Olivier Klein Senior Solutions Architect Fu Ting Chan, Founder InfoForce.co