Using AWS to design and build your data architecture has never been easier to gain insights and uncover new opportunities to scale and grow your business. Join this workshop to learn how you can gain insights at scale with the right big data applications.
2. Today's workshop
2:00pm - 2:15pm Overview on using modern data architectures on AWS
2:15pm - 3:40pm Modern data architectures for business insights at scale
(Includes Live Demos)
3:40pm – 4:00pm Break
4:00pm - 5:15pm Modern data architectures for real-time analytics and
engagement
(Includes Live Demos)
4. What is driving the requests for information?
- What information is needed?
- Where does the source data live?
- Freshness - how real-time?
What kind of persona are you serving?
- Measurable business outcome?
- Speed to access / urgency
- UI - interactive vs file vs embedded
- On-demand vs published
5. Gartner: User Survey Analysis: Key Trends Shaping the Future of Data Center Infrastructure Through 2011
IDC: Worldwide Business Analytics Software 2012–2016 Forecast and 2011 Vendor Shares
Available for analysis
Generated data
Data volume - Gap
1990 2000 2010 2020
Should we collect "all the data" and see what's in it?
6. Starting by amassing "all your data" and dumping
into a large repository for the data gurus to start
finding "insights" is like trying to win the lottery
12. Starting small is powerful, when you can scale
up fast
Scaling up your analytics systems With AWS Traditional IT *
get a new BI server 20 minutes 3 months
upgrade your analytics server to the
newest Intel processors and add 16GB
memory
15 minutes 2 months
add 500TB of storage instant 2 months
grow a DWH cluster from 8GB to 1PB 1 hour 8 months
build a 1024-node Hadoop cluster 30 minutes unlikely
roll out multi-region production
environment
hours months
* actual provisioning times in a well-organized IT division
13. Big Data:
• Potentially massive datasets
• Iterative, experimental style of
data manipulation and analysis
• Frequently not a steady-state
workload; peaks and valleys
• Data is a combination of
structured and unstructured
data in many formats
AWS Cloud:
• Virtually unlimited capacity
• Iterative, experimental usage cost
through on-demand
infrastructure
• Fully scalable infrastructure for
highly variable workloads
• Tools & Services for managing
structured, unstructured and
stream data
15. Outcome 1 : Modernize and consolidate
• Insights to enhance business applications and
create new digital services
Outcome 2 : Innovate for new revenues
• Personalization, demand forecasting, risk analysis
Outcome 3 : Real-time engagement
• Interactive customer experience, event-driven
automation, fraud detection
Outcome 4 : Automate for expansive reach
• Automation of business processes and physical
infrastructure
Driving Business Outcomes via Data Analytics
17. Insights to enhance business applications, new digital services
Technology: Backend system integration, on-prem data center extension, business application
integration, BI provisioning, data lakes, external APIs, access control and logging
Common initiatives
Insights: 360 view of the business
• Legacy data systems migration to enable self-service for business analysts
• Integration of all customer data, from orders, payments, interactions
• Supplier performance for inventory and vendor management
Digitization: Web-service that gives on-demand insights
• Delivery of digital content, with behavior tracking, and upsell (or ads)
• Ordering system for enterprise customers or consumers
Data monetization: Enrich, aggregate, and sell business data
• External data enrichment API, including digital marketing platforms
• Purchasable data sets of anonymized, domain-enriched insights
Outcome 1 : Modernize and Consolidate
18. Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Modernize and consolidate
Insights to enhance business applications, new digital services
Enhancing business applications and creating new digital services takes a few
steps. Business goals often consist of being an agile, well-run organization,
and to stop missing opportunities because people are making decisions
without accurate insights. These initiatives are focused on giving important
personas fast and secure access to business-relevant insights.
19. Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modernize and consolidate
Insights to enhance business applications, new digital services
Business users
External buyers
1. Define personas and use case requirements (including UI)
Data analysts
20. Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modernize and consolidate
Insights to enhance business applications, new digital services
Business users
External buyers
Transactions
Web logs /
cookies
ERP
2. Locate the data sources that have the information to extract
Data analysts
21. Fluentd: Open Source Log Collection
https://github.com/fluent/fluentd/
• Fluentd is an open source
data collector to unify data
collection and consumption
• Integration into many data
sources (App Logs, Syslogs,
Twitter etc.)
• Direct integration into AWS
<source>
type tail
format apache2
path /var/log/apache2/access_log
tag s3.apache.access
</source>
<match s3.*.*>
type s3
s3_bucket myweblogs
path logs/
</match>
22. Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modernize and consolidate
Insights to enhance business applications, new digital services
Business users
External buyers
Transactions
Web logs /
cookies
ERP
Ingest
AWS Database
Migration Service
AWS Direct
Connect
AWS Storage
Gateway
Internet
Interfaces
Changed Data
3. Ingest data through incremental or full loads, across secure connections
Data analysts
23. A single, large system may perform a single task
well, but is often too difficult to adapt and scale
24. A system that is decoupled can adapt to a fast
moving business, and can scale up and down with
significantly lower barriers
25. Decouple Storage and Compute
Traditionally analytical workloads
required large databases or data
warehouses, with storage and
compute close to each other
Big Data often benefits from
decoupling storage and compute
Amazon S3 offers virtually unlimited
storage at a per GB/month rate
26. Amazon
S3
Highly available object storage
99.999999999% data durability
Replicated across 3 facilities
Virtually unlimited scale
Pay only for usage, no pre-provisioning
Event notifications to trigger actions
28. 1 instance x 100 hours = 100 instances x 1 hour
(and with Spot Pricing not only faster but also cheaper)
29. Amazon EMR
• Amazon EMR supports all common
Hadoop Frameworks such as:
• Spark, Pig, Hive, Hue, Oozie …
• Hbase, Presto, Impala …
• Decouples storage from compute
• Allows independent scaling
• Direct Integration with DynamoDB
and S3
Amazon S3Amazon
DynamoDB
Amazon EMR
30. AWS
Glue
Managed Transform Engine
Job Scheduler
Data Catalog
Built on Apache Spark
Integrated with S3, RDS, Redshift & any
JDBC-compliant data store
31.
32. Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modernize and consolidate
Insights to enhance business applications, new digital services
Business users
External buyers
Transactions
Web logs /
cookies
ERP
Ingest
AWS Database
Migration Service
AWS Direct
Connect
AWS Storage
Gateway
Internet
Interfaces
Changed Data
4. Use Hadoop for large scale ETL, data quality, and preparation [*EMRFS]
AWS Glue
Amazon S3
Raw Data
Amazon EMR
ETL
Data analysts
Amazon S3
Clean Data
33. Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modernize and consolidate
Insights to enhance business applications, new digital services
Business users
External buyers
Transactions
Web logs /
cookies
ERP
Ingest
AWS Database
Migration Service
AWS Direct
Connect
AWS Storage
Gateway
Internet
Interfaces
Changed Data
5. Stage all data into centralized, highly available, durable storage for further access
AWS Glue
Amazon S3
Raw Data
Data analysts
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
34. Fully managed
MPP SQL database - fully relational
Optimised for analytics
Gigabytes to Petabytes
Less than 1/10th the cost of traditional
Amazon
Redshift
35. Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modernize and consolidate
Insights to enhance business applications, new digital services
Business users
External buyers
Transactions
Web logs /
cookies
ERP
Ingest
AWS Database
Migration Service
AWS Direct
Connect
AWS Storage
Gateway
Internet
Interfaces
Changed Data
6. Load semi-structured into Hadoop, structured into the DWH, and application data
into managed legacy application databases
AWS Glue
Amazon S3
Raw Data
Amazon EMR
Semi-structured
Amazon RedShift
Data Warehouse
Amazon RDS
Legacy Apps
Data analysts
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
36. AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modernize and consolidate
Insights to enhance business applications, new digital services
Business users
External buyers
Transactions
Web logs /
cookies
ERP
Ingest
AWS Database
Migration Service
AWS Direct
Connect
AWS Storage
Gateway
Internet
Interfaces
Changed Data
7. Data is protected through identity and access management and logging
AWS Glue
Amazon S3
Raw Data
Amazon EMR
Semi-structured
Amazon RedShift
Data Warehouse
Amazon RDS
Legacy Apps
Data analysts
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
37. Fast, cloud-powered, BI service
Visualizations and ad-hoc analysis
Connectors for AWS and 3rd party sources
In-memory calculation engine (SPICE)
$9 per user per month
Amazon
QuickSight
38.
39. AWS Marketplace
• Pre-Configured machine images
ready to be launched into virtual
server instances
• Launch applications with 1-Click
• Pay software licenses by the
hour or bring your own license
(BYOL)
40. AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modernize and consolidate
Insights to enhance business applications, new digital services
Business users
External buyers
Transactions
Web logs /
cookies
ERP
Ingest
AWS Database
Migration Service
AWS Direct
Connect
AWS Storage
Gateway
Internet
Interfaces
Changed Data
8. Data analysts use BI tools of choice to access all serving services
AWS Glue
Amazon S3
Raw Data
Amazon EMR
Semi-structured
Amazon RedShift
Data Warehouse
Amazon RDS
Legacy Apps
Data analysts
Amazon
QuickSight
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
41. AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modernize and consolidate
Insights to enhance business applications, new digital services
Business users
External buyers
Transactions
Web logs /
cookies
ERP
Ingest
AWS Database
Migration Service
AWS Direct
Connect
AWS Storage
Gateway
Internet
Interfaces
Changed Data
9. Business users have enterprise applications enhanced by analytics
AWS Glue
Amazon S3
Raw Data
Amazon EMR
Semi-structured
Amazon RedShift
Data Warehouse
Amazon RDS
Legacy Apps
Data analysts
Amazon
QuickSight
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
42. AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modernize and consolidate
Insights to enhance business applications, new digital services
Business users
External buyers
Transactions
Web logs /
cookies
ERP
Ingest
AWS Database
Migration Service
AWS Direct
Connect
AWS Storage
Gateway
Internet
Interfaces
Changed Data
10. External parties can buy services or data in a governed, secure way
AWS Glue
Amazon S3
Raw Data
Amazon EMR
Semi-structured
Amazon RedShift
Data Warehouse
Amazon RDS
Legacy Apps
Data analysts
Amazon
QuickSight
Amazon
API Gateway
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
43. Speed (Real-time)
Ingest ServingData
sources
Scale (Batch)
Modernize and consolidate
Insights to enhance business applications, new digital services
Business users
External buyers
Transactions
Web logs /
cookies
ERP
Ingest
AWS Database
Migration Service
AWS Direct
Connect
AWS Storage
Gateway
Internet
Interfaces
Changed Data
AWS Glue
Amazon S3
Raw Data
Amazon EMR
Semi-structured
Amazon RedShift
Data Warehouse
Amazon RDS
Legacy Apps
Data analysts
Amazon
QuickSight
Amazon
API Gateway
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
44. Personalization, demand forecasting, risk analysis
Technology: Advanced analytics, customer segmentations, high volume transactional data, un/semi-
structured data, design of experiment, A/B & hypothesis testing, machine learning
Common initiatives
Personalization: Refine market approaches based on optimal segments
• Offer products to new customers based on clusters of similar individuals
• Launch share of wallet initiatives, understanding likely total spend
• Targeted marketing to capture interests and increase conversion rates
Predict demand: Guide business owners to select the best scenarios
• Launch items or promotions at the optimal time to maximize response
• Modeling for store assortment, product selection, and merchandizing
• New product design, based on known market propensities
Risk measurement: Create freedom to act by quantifying exposures
• Scenario simulation to encourage investments and new offerings
• Supply chain analytics allows for faster confirmation of goods to customers
Outcome 2 : Innovate for new revenues
45.
46.
47. Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Driving net new revenues is realized by business teams that have access to
skilled analysts, using platforms that can scale up and out, without IT
bottlenecks. Organizations start operating based on what they know about
their customers, and can approach new ventures in terms of confidence
levels. Product launches, campaigns, supply chain management, packaged
services, and customized offerings are designed and executed based on
predictive models.
48. Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
AWS
Cloud TrailAWS IAM
Amazon
CloudWatch
Data analysts
Data scientists
Business users
Engagement platforms
AWS KMS
1. Personas involved in generating new revenues are data scientists, data
analysts (often embedded), business users, and customers/suppliers
49. AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Direct
Connect
Data analysts
Data scientists
Business users
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
AWS Glue
2. Advanced analytics are built from a base of traditional data processing
Amazon EMR
Amazon RedShift
Amazon RDS
50. AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Direct
Connect
Data analysts
Data scientists
Business users
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
AWS Glue
3. On-premise storage and databases are connected and converted
Amazon EMR
Amazon RedShift
Amazon RDS
AWS Database
Migration Service
AWS Storage
Gateway
51. AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Direct
Connect
Internet
Interfaces
Data analysts
Data scientists
Business users
Web logs /
cookies
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
AWS Glue
4. Internet-native data sources, like web and mobile, are captured
Amazon EMR
Amazon RedShift
Amazon RDS
AWS Database
Migration Service
AWS Storage
Gateway
52. Stream in Real Time: Amazon Kinesis
• Real-Time Data Processing over
large distributed streams
• Elastic capacity that scales to
millions of events per second
• React In real-time upon incoming
stream events
• Reliable stream storage
replicated across 3 facilities
Amazon Kinesis
53. AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon
Kinesis
AWS Storage
Gateway
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
AWS Glue
5. Streaming un/semi-structured data feeds, like social and devices are
captured
Amazon EMR
Amazon RedShift
Amazon RDS
54. AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon
Kinesis
AWS Storage
Gateway
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon S3
Schemaless
AWS Glue
6. Log files and other schemaless data converted to Parquet and staged
Amazon EMR
Amazon RedShift
Amazon RDS
55. Interactive query service to analyze data
in Amazon S3 directly using standard SQL
No need to move data
No infrastructure to setup & manage
Fast -- results within seconds
Pay for only the queries you run
Amazon
Athena
56. AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon
Kinesis
Amazon EMR
Amazon ElasticSearch
AWS Storage
Gateway
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon S3
Schemaless
AWS Glue
7. Data analysts explore and visualize un/semi-structured data
Amazon RedShift
Amazon RDS
Amazon Athena
57. Amazon Machine Learning
• Easy to use, managed machine
learning service built for developers
• Machine learning technology based
on Amazon’s internal systems
• Create models using data stored in
Amazon S3, Amazon RDS or Amazon
Redshift
• Request predictions on batch or real-
time
Amazon Machine
Learning
58. AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon
Kinesis
AWS Storage
Gateway
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon
Machine Learning
Amazon S3
Schemaless
AWS Glue
8. Simple analytical models are built against Amazon Machine Learning
Amazon EMR
Amazon RedShift
Amazon RDS
Amazon ElasticSearch
Amazon Athena
59. Apache Spark
• In-memory analytics cluster using RDD
(Resilient Distributed Dataset) for fast
processing
• Spark MLlib offers machine learning out of the box
• Apache Spark can read directly from Amazon S3
data = sc.textFile("s3://...")
parsedData = data.map(lambda line: array([float(x) for x in line.split(' ')]))
model = KMeans.train(parsedData, 2, maxIterations=10, initializationMode="random")
model.save(sc, "MyModel")
sameModel = KMeansModel.load(sc, "MyModel")
60. Intel® Processor Technologies
Intel® AVX – Dramatically increases performance for highly parallel HPC workloads
such as life science engineering, data mining, financial analysis, media processing
Intel® AES-NI – Enhances security with new encryption instructions that reduce the
performance penalty associated with encrypting/decrypting data
Intel® Turbo Boost Technology – Increases computing power with performance that
adapts to spikes in workloads
Intel Transactional Synchronization (TSX) Extensions – Enables execution of
transactions that are independent to accelerate throughput
P state & C state control – provides granular performance tuning for cores and sleep
states to improve overall application performance
61. New X1 Instance - Tons of Memory
• Designed for large-scale, in-memory
applications in the cloud
• Ideal for in-memory databases like SAP
HANA and big data processing apps like
Spark and Presto
• Powered by Intel® Xeon® E7 8880 v3
Haswell processors
• Features up to 2TB of memory and up to
128 vCPUs per instance
• 8X the memory offered by any other Amazon EC2
instance
62. Machine Learning Algorithms
• Classification
• Sentiment analysis – Do people like my new product?
• Linear Regression
• Trend prediction – How much revenue next month?
• Clustering
• Recommendation - Other people bought this!
• Association
• Market basket analysis – Bundled products
• Neural Networks
• Pattern recognition - Speech recognition
Amazon Machine
Learning
Amazon EMR +
Spark Mlib
GPU Optimized
EC2 Instance
63. AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon
Kinesis
AWS Storage
Gateway
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon EMR
MLlib
Amazon S3
Schemaless
AWS Glue
9. Complex analytical models are built against EMR (Spark) clusters
Amazon EMR
Amazon RedShift
Amazon RDS
Amazon ElasticSearchAmazonML
Amazon Athena
64. AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon
Kinesis
AWS Storage
Gateway
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon EMR
MLlib
Amazon S3
Schemaless
AWS Glue
10. Deep learning models are built against mxnet clusters
Amazon EMR
Amazon RedShift
Amazon RDS
Amazon ElasticSearch
Deep Learning
AmazonML
Amazon Athena
65. AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon
Kinesis
AWS Storage
Gateway
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon S3
Schemaless
AWS Glue
11. Predictive models and scored datasets are published to data staging
Amazon EMR
Amazon RedShift
Amazon RDS
Amazon ElasticSearchAmazon EMR
MLlib
Deep Learning
AmazonML
Amazon Athena
66. AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon
Kinesis
Amazon EMR
Amazon ElasticSearch
AWS Storage
Gateway
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon S3
Schemaless
AWS Glue
12. Analysts use DWH, EMR, ES to find patterns & measure performance
Amazon RedShift
Amazon RDS
Amazon EMR
MLlib
Deep Learning
AmazonML
Amazon Athena
67. AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon
Kinesis
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
AWS Storage
Gateway
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon S3
Schemaless
AWS Glue
13. Risk models evaluated to create new products and assess customers
Amazon RDS
Amazon EMR
MLlib
Deep Learning
AmazonML
Amazon Athena
68. AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon
Kinesis
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
AWS Storage
Gateway
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon S3
Schemaless
AWS Glue
14. Demand forecasts loaded into supply chain management systems
Amazon EMR
MLlib
Deep Learning
AmazonML
Amazon Athena
69. Amazon SNS & Amazon Pinpoint
• Amazon SNS is a fully
managed, cross-platform
mobile push intermediary
service
• Fully scalable to millions
of devices
• Amazon Pinpoint allows
to created targeted
campaigns and measure
engagement and results
Amazon SNS
Apple APNS
Google GCM
Amazon ADM
Windows WNS and
MPNS
Baidu CP
Android Phones and Tablets
Apple iPhones and iPads
Kindle Fire Devices
Android Phones and Tablets in China
iOS
Windows Phone Devices
Amazon
SNS
70. AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon
Kinesis
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
AWS Storage
Gateway
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon S3
Schemaless
AWS Glue
15. Personalized offers are broadcast out over notification channels
Amazon SNS
Amazon EMR
MLlib
Deep Learning
AmazonML
Amazon Athena
Amazon Pinpoint
71. Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Innovate for new revenues
Personalization, demand forecasting, risk analysis
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon
Kinesis
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
AWS Storage
Gateway
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon S3
Schemaless
AWS Glue
Amazon SNS
Amazon EMR
MLlib
Deep Learning
AmazonML
Amazon Athena
AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Amazon Pinpoint
73. Athena & Quicksight Demo
Amazon
S3
Amazon
Athena
Amazon
Quicksight
Analyze past flight performance data stored in S3
Bureau of Transportation Flight Data Statistics
www.transtats.bts.gov
Create visualizations from S3 with Athena & Quicksight
76. Interactive customer experience, event-driven automation, fraud detection
Technology: Clickstream/mobile apps/sensor/video (computer vision)/audio (intent comprehension), event
detection and pipelining, in-line scoring, serverless compute, computer vision, deep learning
Common initiatives
Interactive CX: Natural customer journeys with adaptive interfaces
• Behavior-based recommendations, improving personalization along the journey
• Seamless session transfer across UI, from browser to mobile to physical location
• Voice-driven commands, and use of gestures and other natural interfaces
Event-driven automation: Full execution of business process driven by an action
• Order fulfillment, with real-time update notifications to customer
• Fast response to customer complaints/comments over direct or social channels
Fraud detection: Protect customer and business w/ real-time anomaly detection
• Purchase and payment verification, using behavioral models and location assessment
• Application and account opening validation
Outcome 3 : Real-time Engagement
80. The Power of Speech: Alexa
Alexa, the voice service that powers
Echo, provides capabilities, or skills,
that enable customers to interact with
devices using voice
Alexa Skills Kit (ASK) allows everyone
to build and publish their own skills
Skills can be powered by AWS
Lambda
81. Automated Speech Recognition (ASR)
Natural Language Processing (NLP)
Alexa Skills Kit (ASK)
Over 80 services, including Core, Security, Database,
Artificial Intelligence, Analytics, Mobile Development
83. Build your own Alexa Skill!
Amazon
Echo
Alexa Skills
Kit
AWS Lambda Facebook
Page
84. Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Real-time engagement
Interactive customer experience, event-driven automation, fraud detection
Provide superior customer service by responding to opportunities in real
time. Fulfill requests for products or services in an automated fashion to
create a strong competitive advantage over those that are unable to.
Assurance becomes a different challenge, when speeds increase, and fraud
prevention must be adaptive and fast. Adding another layer of opportunity and
complexity is the use of vast streams of data from devices that are
measuring location, video, behaviors, environmental conditions, and more.
85. AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Real-time engagement
Interactive customer experience, event-driven automation, fraud detection
Data analysts
Data scientists
Business users
Engagement platforms
Automation / events
1. Real-time engagement requires personas that develop the analytics,
and platforms for engaging and automating processes
86. AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Real-time engagement
Interactive customer experience, event-driven automation, fraud detection
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon
Kinesis
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
AWS Storage
Gateway
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
Automation / events
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon S3
Schemaless
AWS Glue
2. Real-time systems are built from a base of advanced data processing
Amazon EMR
MLlib
Deep Learning
AmazonML
Amazon Athena
87. AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Real-time engagement
Interactive customer experience, event-driven automation, fraud detection
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon
Kinesis
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
AWS Storage
Gateway
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
Automation / events
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon S3
Schemaless
AWS Glue
Amazon
Kinesis
3. Events are pipelined through Kinesis, into multiple streams, at scale
Amazon EMR
MLlib
Deep Learning
AmazonML
Amazon Athena
88. Also possible with Spark Streaming!
Amazon
Kinesis
EMR with
Spark Streaming
KinesisUtils.createStream(‘twitter-stream’)
.filter(_.getText.contains(‘Big Data’))
.countByWindow(Seconds(5))
Counting tweets on a sliding window
89. AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Real-time engagement
Interactive customer experience, event-driven automation, fraud detection
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon S3
Stream Data
Amazon
Kinesis
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
AWS Storage
Gateway
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
Automation / events
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon S3
Schemaless
Amazon
Kinesis
4. Event data is given context and structure in EMR and pushed for batch
Amazon EMR
AWS Glue
Amazon EMR
MLlib
Deep Learning
AmazonML
Amazon Athena
90. Amazon Kinesis Firehose
• Fully managed data streaming service to ingest and
capture data into your storage or data warehouse
• Ability to batch load, compress or encrypt streaming
data
• Elastic to scale to any throughput (no more sharding)
• Charged only per GB processed ($0.035 per GB)
91. AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Real-time engagement
Interactive customer experience, event-driven automation, fraud detection
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon S3
Stream Data
Amazon
Kinesis
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
AWS Storage
Gateway
Amazon
Kinesis Firehose
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
Automation / events
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon S3
Schemaless
Amazon
Kinesis
5. Kinesis Firehose pumps events into a DWH for near real-time analysis
Amazon EMR
Amazon EMR
MLlib
Deep Learning
AmazonML
AWS Glue
Amazon Athena
92. AWS Lambda
• Use AWS Lambda to clean and
massage incoming data
• Write code to load data sources
(S3, DynamoDB) automatically in your
data warehouse (e.g. Amazon Redshift)
• React in real-time to incoming events in
Amazon Kinesis
Amazon Lambda
Amazon Redshift
Amazon
Kinesis
93. AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Real-time engagement
Interactive customer experience, event-driven automation, fraud detection
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon S3
Stream Data
Amazon
Kinesis
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
AWS Storage
Gateway
Amazon
Kinesis Firehose
Event Scoring
AWS Lambda
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
Automation / events
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon S3
Schemaless
Amazon
Kinesis
6. The event is streamed to a scoring server for processing
Amazon EMR
Amazon EMR
MLlib
Deep Learning
AmazonML
AWS Glue
Amazon Athena
95. Unlimited
Replays
Returns an MP3
or audio stream
Lightning Fast
Response
Fully Managed and
Low Cost
Amazon Polly
Turn text into lifelike speech using deep
learning technologies to synthesize
speech that sounds like a human voice
96. Amazon Polly
“The temperature
in WA is 75°F”
“The temperature
in Washington is 75 degrees
Fahrenheit”
Amazon Polly: Text In, Life-like Speech Out
97. Amazon Lex
Conversational interfaces for your
applications, powered by the same
Natural Language Understanding
(NLU) & Automatic Speech Recognition
(ASR) models as Alexa
Integrated
development in
AWS console
Trigger AWS
Lambda
functions
Multi-step
conversations
Continually improving
ASR & NLU models
Enterprise
connectors
Fully Managed
98. Intents
A particular goal that the
user wants to achieve
Utterances
Spoken or typed phrases
that invoke your intent
Slots
Data the user must provide to fulfill the
intent
Prompts
Questions that ask the user to input
data
Fulfillment
The business logic required to fulfill the
user’s intent
BookHotel
99. Amazon Rekognition
Image Recognitions and Analysis
powered by Deep Learning which
allows to search, verify and organize
millions of images
Easy to use Batch Analysis Real-time
Analysis
Continually Improving Low Cost
113. Amazon
Kinesis
Twitter Stream Amazon
Lambda
Demo: Live Twitter Feed Analysis
* https://blog.twitter.com/2013/new-tweets-per-second-record-and-how
Twitter Blog* - On a typical day (in 2013):
• More than 500 million Tweets sent
• Average 5,700 TPS
Amazon
Elasticsearch
Service
114. Automation of self-service, deployment, policy, and quality assurance
Technology: Self-service, on-demand provisioning, DevOps, spot pricing, Cloud Formations, security
automation, performance monitoring (CW&XR), global rollouts
Common initiatives
Self-service:
• Application catalog or portal for all employees, availability determined by role
• Service provisioning backed by automation of policy and governance
Agile development: Use of DevOps to allow very few resources to deploy globally
• CI/CD for software release, build/test, and deployment automation
• Templated infrastructure provisioning, and configuration management
• Business rules and policies are "gold coded" to be used for all deployments
• Use of Security by Design (SbD) to codify network, O/S, and encryption
Comprehensive monitoring: Assurance of SLA and issue remediation
• Logging and monitoring of all API calls and executions to ensure SLAs are met
• Analysis of performance variance for faster root cause analysis
Outcome 4 : Automate for expansive reach
115. AWS
Cloud TrailAWS IAM
Amazon
CloudWatchAWS KMS
Ingest ServingData
sources
Speed (Real-time)
Scale (Batch)
Automate for expansive reach
Automation of self-service, deployment, policy, and quality assurance
Transactions
AWS Database
Migration Service
AWS Direct
Connect
Internet
Interfaces
Amazon S3
Stream Data
Amazon
Kinesis
Amazon EMR
Amazon ElasticSearch
Amazon RedShift
Amazon RDS
Amazon DynamoDB
Amazon SQS
AWS Storage
Gateway
Amazon
Kinesis Firehose
Event Scoring
Amazon AI
AWS Lambda AWS Lambda
Data analysts
Data scientists
Business users
Connected
devices
Web logs /
cookies
Social media
Engagement platforms
Automation / events
ERP
Amazon S3
Raw Data
Amazon S3
Staged Data
(Data Lake)
Amazon EMR
ETL
Amazon S3
Clean Data
Amazon S3
Schemaless
Amazon
Kinesis
Amazon EMR
Amazon EMR
MLlib
Deep Learning
AmazonML
AWS Glue
Amazon Athena
AWS DevOps
117. Sharpen your skills (Singapore)
Attend the official AWS Training course organized by AWS Authorized local
training partner – Bespoke Training Services (www.bespoketraining.com).
Join the AWS Jumpstart (2 hr) session and hear from our customers and
partners on how they enabled their teams and successfully deployed on
AWS. Also stand a chance to win free seat to the above courses.
Point of contact – Gilbert Cheo - gilbert@bespoketraining.com
Courses Date
Architecting on AWS 28 Feb-2 Mar / 14-16 March
System Operations on
AWS
22-24 Feb
Developing on AWS 4-6 April
Big Data on AWS 4-6 April
Date Venue
AWS Singapore, Church Street, Capital Square,
#10-01, Singapore 049481
Amazon EMR simplifies big data processing, providing a managed Hadoop framework that makes it easy, fast, and cost-effective for you to distribute and process vast amounts of your data across dynamically scalable Amazon EC2 instances.
The EMR File System allows EMR clusters to efficiently and securely use Amazon S3 as an object store for Hadoop. You can store your data in Amazon S3 and use multiple Amazon EMR clusters to process the same data set. Each cluster can be optimized for a particular workload, which can be more efficient than a single cluster serving multiple workloads with different requirements. For example, you might have one cluster that is optimized for I/O and another that is optimized for CPU, each processing the same data set in Amazon S3. Additionally, by storing your input and output data in Amazon S3, you can shut down clusters when they are no longer needed.
Amazon EMR makes it easy to use spot instances so you can save both time and money. Amazon EMR clusters include 'core nodes' that run HDFS and ‘task nodes’ that do not; task nodes are ideal for Spot because if the Spot price increases and you lose those instances you will not lose data stored in HDFS.
Amazon EMR supports powerful and proven Hadoop tools such as Hive, Pig, HBase, and Impala. Additionally, it can run distributed computing frameworks besides Hadoop MapReduce such as Spark or Presto using bootstrap actions. You can also use Hue and Zeppelin as GUIs for interacting with applications on your cluster.
Amazon EMR simplifies big data processing, providing a managed Hadoop framework that makes it easy, fast, and cost-effective for you to distribute and process vast amounts of your data across dynamically scalable Amazon EC2 instances.
The EMR File System allows EMR clusters to efficiently and securely use Amazon S3 as an object store for Hadoop. You can store your data in Amazon S3 and use multiple Amazon EMR clusters to process the same data set. Each cluster can be optimized for a particular workload, which can be more efficient than a single cluster serving multiple workloads with different requirements. For example, you might have one cluster that is optimized for I/O and another that is optimized for CPU, each processing the same data set in Amazon S3. Additionally, by storing your input and output data in Amazon S3, you can shut down clusters when they are no longer needed.
Amazon EMR makes it easy to use spot instances so you can save both time and money. Amazon EMR clusters include 'core nodes' that run HDFS and ‘task nodes’ that do not; task nodes are ideal for Spot because if the Spot price increases and you lose those instances you will not lose data stored in HDFS.
Amazon EMR supports powerful and proven Hadoop tools such as Hive, Pig, HBase, and Impala. Additionally, it can run distributed computing frameworks besides Hadoop MapReduce such as Spark or Presto using bootstrap actions. You can also use Hue and Zeppelin as GUIs for interacting with applications on your cluster.
…..And by the way if you thought that these innovative digital use cases were only happening globally outside, you could NOT be more wrong. A lot of large Indian companies are increasing their digital presence and seeing massive success in those areas in doing those….CLICK
More : https://aws.amazon.com/blogs/aws/ec2-instance-update-x1-sap-hana-t2-nano-websites/