(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

© 2015, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Jon Handler, Principal Solutions Architect
Pravin Pillai, Senior Product Manager
October 2015
BDT209
Amazon Elasticsearch Service
for Real-time Data Analysis and
Visualization

What to Expect from the Session
• Context: Managing your growing data
• Introducing Amazon Elasticsearch Service (Amazon ES)
• Configuring, securing, connecting, monitoring, and
scaling your Amazon ES cluster

Your data is constantly growing
Product usage

System logs

Customer conversations

“Big data is not about the data”
- Gary King, Harvard University, making the point that while data is
plentiful and easy to collect, the real value is in the analytics.

So what can you do with all this data?
• Share information
• Extract insight
• Recognize patterns
• Track performance
Ultimately, make better business,
technical, and operational decisions

Scenario 1: Full-text search
Knowledge Sharing Systems
• Your team is constantly generating
content
• You are tasked with making this
knowledge base searchable and
accessible
• You need key search features including
text matching, faceting, filtering, fuzzy
search, auto complete, and highlighting

Scenario 2: Streaming data analytics
Intrusion detection
• You have to protect your system from
attacks
• You need easy to use, yet powerful
analytics and data visualization tools to
detect issues in near real-time
• Easy and flexible data ingestion is
important to capture information from a
variety of key data sources

Scenario 3: Batch data analytics
Usage Monitoring
• You are a mobile app developer
• You have to monitor/manage users
across multiple app versions
• You want to analyze and report on
usage and migration between app
versions

How Elasticsearch can help
A powerful, real-time, distributed, open-source search and
analytics engine:
• Built on top of Apache Lucene
• Schema free
• Developer friendly RESTful API

How Elasticsearch can help
Combined with Logstash and Kibana, the ELK stack
provides a tool for real-time analytics and data visualization

Operating Elasticsearch is time-consuming
“Elasticsearch allows us to easily and quickly build bleeding edge big data
and analytics applications using the ELK stack. By offering direct access
to the Elasticsearch API while offloading administrative tasks, Amazon
Elasticsearch Service gives us the manageability, flexibility and control we
need ”
Sean Curtis,
SVP Engineering at Major League
Baseball Advanced Engineering

Introducing Amazon Elasticsearch Service
Amazon Elasticsearch Service is
a managed service from AWS that
makes it easy to set up, operate,
and scale Elasticsearch clusters
in the cloud.

Key benefits
Easy cluster
creation and
configuration
management
Support for ELK Security with AWS
IAM
Monitoring with
Amazon
CloudWatch
Auditing with AWS
CloudTrail
Integration options
with other AWS
services
(CloudWatch Logs,
Amazon
DynamoDB,
Amazon S3,
Amazon Kinesis)

AWS CLI commands
add-tags
create-elasticsearch-domain
delete-elasticsearch-domain
describe-elasticsearch-domain
describe-elasticsearch-domain-
config
describe-elasticsearch-domains
list-domain-names
list-tags
remove-tags
update-elasticsearch-domain-config
aws es create-elasticsearch-domain --domain-name my-domain
--elasticsearch-cluster-config
InstanceType=m3.xlarge.elasticsearch,InstanceCount=3
--ebs-options
EBSEnabled=true,VolumeType=gp2,VolumeSize=512

Amazon ES domain overview
Amazon Route
53
Elastic Load
Balancing
IAM
CloudWatch
Elasticsearch API
CloudTrail

Amazon Route
53
Elastic Load
Balancing
IAM
CloudWatch
Elasticsearch API
CloudTrail
Nodes under management

IAM
CloudWatchCloudTrail
Elasticsearch API
Amazon Route
53
Elastic Load
Balancing
Single endpoint, REST API

Elasticsearch API
Amazon Route
53
Elastic Load
Balancing
IAM
IAM integration

Elasticsearch API
Amazon Route
53
Elastic Load
Balancing
IAM
CloudWatch/CloudTrail for monitoring

Data partitioning for search
Shard 1 Shard 2
{
{
Id Id Id . . .
Documents
Index
• Document: The unit of search
• ID: Unique identifier, one per
document
• Field: Documents comprise a
collection of fields
• Shard: An instance of Lucene with
a portion of an index
• Index: A collection of data

Deployment of indices to a cluster
• Index 1
• Shard 1
• Shard 2
• Shard 3
• Index 2
• Shard 1
• Shard 2
• Shard 3
Amazon ES cluster
1
2
3
1
2
3
1
2
3
1
2
3
Primary Replica
1
3
3
1
Instance 1
2
1
1
2
Instance 2
3
2
2
3
Instance 3

Instance type recommendations
Instance Workload
T2 Entry point. Dev and test. OK for dedicated masters.
M3 Equal read and write volumes. Up to 5 TB of storage with EBS.
R3 Read-heavy or workloads with high query demands (e.g.,
aggregations).
I2 Up to 16 TB of SSD instance storage.

Secure access to your domain
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"AWS": "arn:aws:iam:123456789012:user/susan"
},
"Action": [ "es:ESHttpGet", "es:ESHttpPut", "es:ESHttpPost",
"es:CreateElasticsearchDomain",
"es:ListDomainNames" ],
"Resource":
"arn:aws:es:us-east-1:###:domain/logs-domain/<index>/*"
} ] }

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
},
"Resource":
} ] }
Control access by user
with signed requests

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
},
"Resource":
} ] }
Allow/Deny HTTP
methods and Config
operations per policy

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
},
"Resource":
} ] }
Fine-grained control to the
index level

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"AWS": "*"
},
"Resource":
"arn:aws:es:us-east-1:###:domain/logs-domain/<index>/*",
"Condition":
"IpAddress": {
"aws:SourceIp": [ "xx.xx.xx.xx/yy" ]
} } ] }
And/or use IP-based
access control

Direct access to the Elasticsearch API
$ curl -XPUT https://<endpoint>/blog -d '{
"settings" : { "number_of_shards" : 3, "number_of_replicas" : 1 } }'
$ curl -XPOST http://<endpoint>/blog/post/1 -d '{
"author":"jon handler",
"title":"Amazon ES Launch" }'
$ curl -XPOST https://<endpoint>/blog/post/_bulk -d '
{ "index" : { "_index" : "blog", "_type" : "post", "_id" : "2"}}
{"title":"Amazon ES for search", "author": "pravin pillai"},
{ "index" : { "_index":"blog", "_type":"post", "_id":"3" } }
{ "title":"Analytics too", "author": "vivek sriram"}'
$ curl -XGET http://<endpoint>/_search?q=ES
{"took":16,"timed_out":false,"_shards":{"total":3,"successful":3,"failed":0},"hits
":{"total":2,"max_score":0.13424811,"hits":[{"_index":"blog","_type":"post","_id":"1","
_score":0.13424811,"_source":{"author":"jon handler", "title":"Amazon ES Launch"
}},{"_index":"blog","_type":"post","_id":"2","_score":0.11506981,"_source":{"title":"Am
azon ES for search", "author": "pravin pillai"},}]}}

Loading data using Logstash
Application nodes/
Logstash forwarders
Logstash indexer
Amazon
Elasticsearch
Service

Logstash plugin for Amazon ES
https://github.com/awslabs/logstash-output-amazon_es
output {
amazones {
*hosts => ["foo.us-east-1.es.amazonaws.com"]
*region => "us-east-1"
access_key => 'ACCESS_KEY' (optional)
secret_key => 'SECRET_KEY' (optional)
codec => "plain"
workers => 1
index => "logstash-%{+YYYY.MM.dd}"
}
}

Loading data using Lambda
Amazon
Lambda
Amazon
Elasticsearch
Service
Amazon S3
DynamoDB
Amazon
Kinesis

Lambda code snippet (node.js) for upload
var AWS = require('aws-sdk');
var creds = new AWS.EnvironmentCredentials('AWS');
function postDocumentToES(doc, context) {
var req = new AWS.HttpRequest(endpoint);
var signer = new AWS.Signers.V4(req, 'es');
signer.addAuthorization(creds, new Date());
var send = new AWS.NodeHttpClient();
send.handleRequest(req, null, function(httpResp)...
https://github.com/awslabs/amazon-elasticsearch-lambda-samples

Export logs to
Amazon ES
CloudWatch Amazon
Elasticsearch
Service

Monitor and
audit
CloudWatch
CloudTrail

What should I monitor?
• FreeStorageSpace – monitor and alarm before the
cluster runs out of space
• CPUUtilization – alarm at 80% CPU to signal the need to
scale up
• ClusterStatus.yellow – check whether replication
requires additional nodes
• JVMMemoryPressure – check instance type and count
for sufficient resources
• MasterCPUUtilization – monitoring for master nodes is
separated from data nodes

Snapshot and
restore for data
durability

Daily automated snapshots
• No additional charges
• Snapshots retained for 14 days

Taking manual snapshots
Amazon S3
role
Snapshot
repository
Trust relationship:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Service": "es.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}

Amazon S3
Snapshot
repository
{ "Version":"2012-10-17",
"Statement":[
{
"Action":[ "s3:ListBucket" ],
"Effect":"Allow",
"Resource":
[ "arn:aws:s3:::bucket" ] },
{ "Action":[
"s3:GetObject",
"s3:PutObject",
"s3:DeleteObject",
"iam:PassRole" ],
"Effect":"Allow",
"Resource":[ "arn:aws:s3:::bucket/*"
] } ] }
role

Register the bucket
curl -XPUT http://<endpoint>/_snapshot/<repo-name>
-d '{"type":"s3",
"settings": {
"bucket":"<bucket>",
"region":"<region>",
"role-arn":"<arn>"}}'
Take a snapshot
curl -XPUT http://<endpoint>/_snapshot/<repo-name>/snapshot1
Snapshot time is proportional to size.

Application overview
Logstash indexer
Amazon
Elasticsearch
Service
Application nodes/
Logstash forwarders

Securing Kibana
IAMProxy
(Optional)

IAM policy for Kibana
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": { "AWS": "*" },
"Action": [ "es:ESHttpGet",
"es:ESHttpPut",
"es:ESHttpPost",
"es:ESHttpHead"],
"Resource": [ "arn:aws:es:us-east-1:####:domain/<domain>/*" ],
"Condition": { "IpAddress": { "aws:SourceIp": [ xx.xx.xx.xx ] } }
}
]
}

Pay for compute and storage you use
With Amazon Elasticsearch Service, you pay only for the
compute and storage resources you use. AWS Free Tier for
qualifying customers.

Amazon Elasticsearch Service is publicly available now!
• us-east-1
• us-west-1
• us-west-2
• eu-west-1
• eu-central-1
• ap-southeast-1
• ap-southeast-2
• ap-northeast-1
• sa-east-1
You can use Amazon Elasticsearch Service in these regions:

Wrap up
1. Elasticsearch is a tool for full-text search, analysis, and
visualization of time series data that helps you get the
most out of your growing data set
2. Amazon Elasticsearch Service makes it easy to deploy
and manage an Elasticsearch cluster in the AWS cloud
3. Amazon Elasticsearch Service is a drop-in replacement
for your existing Elasticsearch cluster

Remember to complete
your evaluations!

(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (6)

Similar to (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics

Similar to (BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics (20)

More from Amazon Web Services

More from Amazon Web Services (20)

Recently uploaded

Recently uploaded (20)

(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics