SlideShare a Scribd company logo
1 of 224
Maziyar PANAHI
Big Data engineer / Cloud Architect
ARGOS - NoSQL / Big Data

Université Paris-Sud, LAL

25 November 2015
Engineer at CNRS
ISCPIF
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE
UNITÉ CNRS UPS3611 - HTTP://ISCPIF.FR Creative commons, open science, open data, ressources mutualisées
ISCPIF
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE
UNITÉ CNRS UPS3611 - HTTP://ISCPIF.FR Creative commons, open science, open data, ressources mutualisées
ISCPIF
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE
http://iscpif.fr/services/
ISCPIF
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE
http://iscpif.fr/services/
ISCPIF
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE
• Core Services
• ROOM RESERVATION

• EVENT ANNOUNCEMENT

• PROJECT HOSTING AND RESIDENCIES

• HIGH PERFORMANCE COMPUTING

• TRAINING SESSIONS

• COMMUNITY EXPLORER

• Open Platforms
• OpenMole

• Gargantext

• Big Data
• Linkrbrain
http://iscpif.fr/services/
ISCPIF
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE
External Collaborators:
ISCPIF Partners:
• Elasticsearch
• MongoDB
• Redis
• RabbitMQ
• Big Data Platform
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
HOW TO SCALE FROM ZERO TO BILLIONS!
–Elasticsearch
“You Know, for Search!”
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch | Real-Time Data
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch | Real-Time Advanced Analytics
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch | Massively Distributed
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch | High Availability
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch | Multi-tenancy
Host Index
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch | Full-Text Search
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch | Document-Oriented & Schema-Free
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch | Developer-Friendly, RESTful API
• Single document APIs
• Index API

• Get API

• Delete API

• Update API
• Multi-document APIs
• Multi Get API

• Bulk API

• Bulk UDP API

• Delete By Query API
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch | Developer-Friendly, RESTful API
Index
Type
ID
Document
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch | Search & Analyze Data in Real Time
Apache 2 Open Source License Build on top of Apache Lucene™
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Installation - Package
1. curl -L -O https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/
tar/elasticsearch/2.0.0/elasticsearch-2.0.0.tar.gz

2. tar -xvf elasticsearch-2.0.0.tar.gz

3. cd elasticsearch-2.0.0/bin

4. ./elasticsearch
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Installation - Repositories
echo "deb http://packages.elastic.co/elasticsearch/2.x/debian stable main" | sudo tee -a /etc/
apt/sources.list.d/elasticsearch-2.x.list
Download and install the Public Signing Key:
Repository definition APT -> /etc/apt/sources.list.d/elasticsearch-2.x.list
sudo apt-get update && sudo apt-get install elasticsearch
wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
Install Elasticsearch 2.0:
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Installation - That’s it!
Simply run:
curl 'http://localhost:9200/?pretty'
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Configuration - System
curl localhost:9200/_nodes/process?pretty
• #File descriptors
• Setting it to 32k or even 64k is
recommended

• #Memory settings
• Disable swap

• sudo swapoff -a

• /etc/fstab
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Configuration - System
-> /etc/default/elasticsearch• ES_HEAP_SIZE
• Leave enough for the OS

• Leave enough for the 

• Neve ever go over 30.1 GB!!

• I’ll go with half < 30 GB
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Configuration - Elasticsearch
curl localhost:9200/_nodes/process?pretty• #/etc/elasticsearch/elasticsearch.yml
• network:

• host : <MACHINE IP ADDRESS>

• path:
• logs: /var/log/elasticsearch

• data: /var/data/elasticsearch

• cluster:
• name: <NAME OF YOUR CLUSTER>

• node:
• name: <NAME OF YOUR NODE>
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Configuration - Elasticsearch
Node.name
Cluster.name
mlockall
# Elasticsearch performs poorly when JVM starts swapping: you should ensure that it _never_ swaps.
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Configuration - Elasticsearch
#Shards
#Replicas
Remember this?
• 1 cluster
• 3 nodes
• 6 shards
• 1 replica
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Shards and Replicas
Why Shards and Replicas?
• ES has built in clustering

• Scaling out index: (shards)

• Parallel work on an index: (shards)

• Increasing availability: (replicas)
• Can change number of replicas anytime!

• Cannot change number of shards after index creation! (must reindex)
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Shards and Replicas
What is Shard?
• You can't actually split an index!

• ES uses Multiple Lucene indexes (AKA SHARDS)

• Simply, a shard is a Lucene index!

• Over head, query hits all shards for scoring

• So they don’t come free!
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | How many Shards and Replicas?
• Replicas:
• More replicas = More availability = Longer indexing!

• Shards
• How much data?

• How many queries?

• How complex are those queries?

• How much resources each node has?

• Number of nodes in your cluster

• Don’t know? over allocate few shards. (but not too many, they are not free!)
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | How many Shards and Replicas?
• Changing number of replicas: easy
curl -XPUT 'localhost:9200/my_index/_settings' -d '
{
"index" : {
"number_of_replicas" : 4
}
}'
• Changing number of shards: must be re-indexed

• For some, not a big deal.

• For some, it is a big deal!
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | How many Shards and Replicas?
• StackOverflow http://stackexchange.com/performance
• Scaling out:

• More shards than the #nodes

• Multiple shards in one node
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | How many Shards and Replicas?
• 3x nodes - 3x shards - 2x replicas

• Failing 2x nodes = cluster’s still healthy

• Doubling the storage need

• each replica = 1/3 of index size

• Storage is cheap, small price to pay for availability
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Let’s Use It!
• Wikipedia uses Elasticsearch to provide full-text search with highlighted search
snippets, and search-as-you-type and did-you-mean suggestions.

• The Guardian uses Elasticsearch to combine visitor logs with social -network data to
provide real-time feedback to its editors about the public’s response to new articles.

• Stack Overflow combines full-text search with geolocation queries and uses more-
like-this to find related questions and answers.

• GitHub uses Elasticsearch to query 130 billion lines of code!
full-text search, structured search, analytics, and all three in combination:
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | RESTful API with JSON over HTTP
• VERB: GET, POST, PUT, HEAD, or DELETE.
• PROTOCOL: http or https
• HOST: hostname of any node
• PORT: Elasticsearch HTTP service, which defaults to 9200
• PATH: API Endpoint (_count, _cluster/stats, _nodes/stats/jvm, etc.)
• QUERY_STRING: any optional query-string parameters for example ?pretty
• BODY: A JSON-encoded request body (if the request needs one.)
curl -X<VERB> '<PROTOCOL>://<HOST>:<PORT>/<PATH>?<QUERY_STRING>' -d '<BODY>'
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | RESTful API with JSON over HTTP
curl -XGET 'http://localhost:9200/_count?pretty' -d '
{
"query": {
"match_all": {}
}
}
'
{
"count" : 0,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
}
}
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Clarification
Relational DB Databases Tables Rows Columns
Elasticsearch Indices Types Documents Fields
• Index (noun)
• Traditional relational database. It is the place to store related documents. 

• Index (verb)
• To index a document is to store a document in an index (noun) so that it can be retrieved
and queried. (Like INSERT in SQL)

• Inverted index
• B-tree index in Relational databases add an index = Elasticsearch and Lucene use a
structure called an inverted index. Both to improve the speed of data retrieval.
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Indexing a document
PUT /cnrs/employee/1
{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
• cnrs
• The index name

• employee
• The type name

• /1
• The ID of this particular employee
PUT /cnrs/employee/2
{
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests": [ "music" ]
}
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Retrieving a document
GET /cnrs/employee/1
{
"_index" : "cnrs",
"_type" : "employee",
"_id" : "1",
"_version" : 1,
"found" : true,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
}
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Deleting a document
DELETE /cnrs/employee/1
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Search
GET /cnrs/employee/_search
{
"took": 6,
"timed_out": false,
"_shards": { ... },
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "cnrs",
"_type": "employee",
"_id": "1",
"_score": 1,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
},
{
"_index": "cnrs",
"_type": "employee",
"_id": "2",
"_score": 1,
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [ "music" ]
}
}
]
}
}
GET /cnrs/employee/_search?q=last_name:Smith
{
...
"hits": {
"total": 2,
"max_score": 0.30685282,
"hits": [
{
...
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
},
{
...
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [ "music" ]
}
}
]
}
}
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Search with Query DSL
GET /cnrs/employee/_search
{
"query" : {
"match" : {
"last_name" : "Smith"
}
}
}
Elasticsearch provides a rich, flexible, query language called the query DSL, which allows us to build much more complicated,
robust queries.
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Search with Query DSL
GET /cnrs/employee/_search
{
"query" : {
"filtered" : {
"filter" : {
"range" : {
"age" : { "gt" : 30 }
}
},
"query" : {
"match" : {
"last_name" : "smith"
}
}
}
}
}
{
...
"hits": {
"total": 1,
"max_score": 0.30685282,
"hits": [
{
...
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [ "music" ]
}
}
]
}
}
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Full-text Search
GET /cnrs/employee/_search
{
"query" : {
"match" : {
"about" : "rock climbing"
}
}
}
{
...
"hits": {
"total": 2,
"max_score": 0.16273327,
"hits": [
{
...
"_score": 0.16273327,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
},
{
...
"_score": 0.016878016,
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [ "music" ]
}
}
]
}
}
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Phrase Search
GET /cnrs/employee/_search
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
}
}
{
...
"hits": {
"total": 1,
"max_score": 0.23013961,
"hits": [
{
...
"_score": 0.23013961,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
}
]
}
}
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Highlighting Searches
GET /cnrs/employee/_search
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
},
"highlight": {
"fields" : {
"about" : {}
}
}
}
{
...
"hits": {
"total": 1,
"max_score": 0.23013961,
"hits": [
{
...
"_score": 0.23013961,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [ "sports", "music" ]
},
"highlight": {
"about": [
"I love to go <em>rock</em> <em>climbing</
em>"
]
}
}
]
}
}
The highlighted fragment from the original text
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Search with Query DSL
• Full text queries
• Match Query

• Multi Match Query

• Common Terms Query

• Query String Query

• Simple Query String
Query

• Term level queries
• Term Query

• Terms Query

• Range Query

• Exists Query

• Missing Query

• Prefix Query

• Wildcard Query

• Regexp Query

• Fuzzy Query

• Type Query

• Ids Query

• Compound queries
• Constant Score Query

• Bool Query

• Dis Max Query

• Function Score Query

• Boosting Query

• Indices Query

• And Query

• Not Query

• Or Query

• Filtered Query

• Limit Query

• Joining queries
• Nested Query

• Has Child Query

• Has Parent Query

• Geo queries
• GeoShape Query

• Geo Bounding Box
Query

• Geo Distance
Query

• Geo Distance
Range Query

• Geo Polygon
Query

• Geohash Cell
Query

• Specialized
queries
• More Like This
Query

• Template Query

• Script Query
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Search with Query DSL
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Aggregations
GET /cnrs/employee/_search
{
"query": {
"match": {
"last_name": "smith"
}
},
"aggs": {
"all_interests": {
"terms": {
"field": "interests"
}
}
}
}
...
"all_interests": {
"buckets": [
{
"key": "music",
"doc_count": 2
},
{
"key": "sports",
"doc_count": 1
}
]
}
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Aggregations
GET /cnrs/employee/_search
{
"aggs" : {
"all_interests" : {
"terms" : { "field" : "interests" },
"aggs" : {
"avg_age" : {
"avg" : { "field" : "age" }
}
}
}
}
}
...
"all_interests": {
"buckets": [
{
"key": "music",
"doc_count": 2,
"avg_age": {
"value": 28.5
},
} {
"key": "sports",
"doc_count": 1,
"avg_age": {
"value": 25
}
}
]
}
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Aggregations
{
"aggs" : {
“my_ip_ranges" : {
"ip_range" : {
"field" : "ip",
"ranges" : [
{ "to" : "10.0.0.5" },
{ "from" : "10.0.0.5" }
]
}
}
}
}
{
...
"aggregations": {
"my_ip_ranges": {
"buckets" : [
{
"to": 167772165,
"to_as_string": "10.0.0.5",
"doc_count": 4
},
{
"from": 167772165,
"from_as_string": "10.0.0.5",
"doc_count": 6
}
]
}
}
}
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Aggregations
{
"aggs" : {
"articles_over_time" : {
"date_histogram" : {
"field" : "date",
"interval" : "month"
}
}
}
}
{
"aggs" : {
"articles_over_time" : {
"date_histogram" : {
"field" : "date",
"interval" : "1.5h"
}
}
}
}
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Aggregations
• Metrics Aggregations
• Avg Aggregation

• Cardinality Aggregation

• Extended Stats Aggregation

• Geo Bounds Aggregation

• Max Aggregation

• Min Aggregation

• Percentiles Aggregation

• Percentile Ranks Aggregation

• Scripted Metric Aggregation

• Stats Aggregation

• Sum Aggregation

• Top hits Aggregation

• Value Count Aggregation

• Bucket Aggregations
• Children Aggregation

• Date Histogram Aggregation

• Date Range Aggregation

• Filter Aggregation

• Geo Distance Aggregation

• GeoHash grid Aggregation

• Histogram Aggregation

• IPv4 Range Aggregation

• Missing Aggregation

• Nested Aggregation

• Range Aggregation

• Reverse nested Aggregation

• Sampler Aggregation

• Significant Terms Aggregation

• Terms Aggregation
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Plugins
• Plugins Types
• Java plugins
• JAR files

• Must be installed on all nodes in the cluster

• Each node must be restarted

• Site plugins
• Web content: JS, HTML, CSS etc.

• Can be only on one node

• Do not require a restart

• Mixed plugins
• Both JAR files and web content
to enhance the core Elasticsearch functionality
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Plugins
• API extension Plugins
• Alerting Plugins
• Analysis Plugins
• Discovery Plugins
• Management and Site Plugins
• Mapper Plugins
• Scripting Plugins
• Security Plugins
• Snapshot/Restore Plugins
• Transport Plugins
• Integrations
to enhance the core Elasticsearch functionality
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Plugins - kopf
Web administration tool for Elasticsearch cluster
sudo [/usr/share/elasticsearch/]bin/plugin install [plugin_name]

sudo [/usr/share/elasticsearch/]bin/plugin install lmenezes/elasticsearch-kopf 

sudo [/usr/share/elasticsearch/]bin/plugin install lmenezes/elasticsearch-kopf/2.x
open http://localhost:9200/_plugin/kopf
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Plugins - kopf
Web administration tool for Elasticsearch cluster
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Plugins - kopf
Web administration tool for Elasticsearch cluster
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Plugins - kopf
Web administration tool for Elasticsearch cluster
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Plugins - kopf
Web administration tool for Elasticsearch cluster
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Plugins - kopf
Web administration tool for Elasticsearch cluster
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Plugins - kopf
Web administration tool for Elasticsearch cluster
index 5k/s larger document (full tweets)
index 23k/s smaller document (tweet date, text, etc.)
index 67k/s just 140 characters and ID
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Plugins - kopf
Web administration tool for Elasticsearch cluster
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Plugins - kopf
Web administration tool for Elasticsearch cluster
sudo service elasticsearch stop
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Plugins - kopf
Web administration tool for Elasticsearch cluster
sudo service elasticsearch start
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Plugins - kopf
Web administration tool for Elasticsearch cluster
sudo service elasticsearch start
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Plugins - kopf
Web administration tool for Elasticsearch cluster
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Full-text search and Data Analytics
• Web of Science
• Text Mining and NLP
• Keyword Extractions
• Phrase Occurrence

• Phrase Co-Occurrence

• Keyword Analytics
• Date Histogram

• Significant Terms

• N-Grams

• etc.
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Full-text search and Data Analytics
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Full-text search and Data Analytics
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Full-text search and Data Analytics
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Full-text search and Data Analytics
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Full-text search and Data Analytics
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Full-text search and Data Analytics
• Boolean
• Query String
• Aggregation
• Date histogram

• Query cache
• nop!
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Full-text search and Data Analytics
{
"bool": {
"must": { "match": { "title": "how to make millions" }},
"must_not": { "match": { "tag": "spam" }},
"should": [
{ "match": { "tag": "starred" }},
{ "range": { "date": { "gte": "2014-01-01" }}}
]
}
}
• title field matches “how to make
millions”

• not marked as spam

• documents are starred or are from
2014 onward will rank higher

• Documents that match both
conditions will rank even higher
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Full-text search and Data Analytics
• Filters
• for binary yes/no searches

• for queries on exact values

• Exists
• just the ones with abstract != null

• Query cache
• true!
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Full-text search and Data Analytics
My bool vs. filter = ~500ms vs. ~50ms - ~120ms
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Full-text search and Data Analytics
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Full-text search and Data Analytics
• Query DSL: Filters
• And Filter

• Bool Filter

• Exists Filter

• Geo Bounding Box Filter

• Geo Distance Filter

• Geo Distance Range Filter

• Geo Polygon Filter

• GeoShape Filter

• Geohash Cell Filter

• Has Child Filter

• Has Parent Filter

• Ids Filter

• Indices Filter

• Limit Filter

• Match All Filter

• Missing Filter

• Nested Filter

• Not Filter

• Or Filter

• Prefix Filter

• Query Filter

• Range Filter

• Regexp Filter

• Script Filter

• Term Filter

• Terms Filter

• Type Filter
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Full-text search and Data Analytics
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Full-text search and Data Analytics
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Full-text search and Data Analytics
• Significant Terms Aggregation
• JLH score

• mutual information
• Chi square

• google normalized distance

• Percentage

• scripted
[Yang and Pedersen, "A Comparative Study on Feature Selection in Text Categorization", 1997]
(http://courses.ischool.berkeley.edu/i256/f06/papers/yang97comparative.pdf) for a study on using
significant terms for feature selection for text classification).
"script_heuristic": {

"script": "_subset_freq/(_superset_freq - _subset_freq + 1)"

}
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Full-text search and Data Analytics
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
The Elastic Platform | Make Sense of Your Data
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
The Elastic Platform | Make Sense of Your Data
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
The Elastic Platform | Make Sense of Your Data
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Logs!
• Check Nginx access logs between 2015-11-23T10:23:10 and 2015-11-24T21:53:30

• Check Suricata alert >= 2 between 2015-11-23T10:23:10 and 2015-11-24T21:53:30 and with type
of DNS

• Now correlate the results!!!
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Logs!
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Logs!
• Old School tools
• grep/sed/awk/cut/sort

• Manually analyze the output

• Different formats

• Customized fields and details

• Not centralized

• Modern way (the right way!)
• Define endpoints (input/output)

• Correlate patterns

• Store data (searchable and visualizable)
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Logs!
• Symantec Security Information Manager

• Splunk

• HP / Arcsight

• Tripwire

• NetIQ

• Quest Software

• IMB/Q1 Labs

• Novell

• graylog

• fluentd
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Logstash | Collect, Enrich & Transport Data
• Process Any Data, From Any Source

• Centralize data processing of all types

• Normalize varying schema and formats

• Quickly extend to custom log formats
• Easily add plugins for custom data sources

• https://github.com/elastic/logstash
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Logstash | Collect, Enrich & Transport Data
input { stdin { } }

filter {

grok {

match => { "message" => "%{COMBINEDAPACHELOG}" }

}

date {

match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]

}

}

output {

elasticsearch { hosts => ["localhost:9200"] }

stdout { codec => rubydebug }

}
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Logstash | Collect, Enrich & Transport Data
127.0.0.1 - - [11/Dec/2013:00:01:45 -0800] "GET /xampp/status.php HTTP/1.1" 200 3891 "http://cadenza/
xampp/navi.php" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0
{

"message" => "127.0.0.1 - - [11/Dec/2013:00:01:45 -0800] "GET /xampp/status.php HTTP/1.1" 200 3891 "http://cadenza/
xampp/navi.php" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0"",

"@timestamp" => "2013-12-11T08:01:45.000Z",

"@version" => "1",

"host" => "cadenza",

"clientip" => "127.0.0.1",

"ident" => "-",

"auth" => "-",

"timestamp" => "11/Dec/2013:00:01:45 -0800",

"verb" => "GET",

"request" => "/xampp/status.php",

"httpversion" => "1.1",

"response" => "200",

"bytes" => "3891",

"referrer" => ""http://cadenza/xampp/navi.php"",

"agent" => ""Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0""

}
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
An input plugin enables a specific source of events to be read by Logstash.
• Input Plugins
• beats

• couchdb_changes

• drupal_dblog

• elasticsearch

• exec

• eventlog

• file
• ganglia
• gelf

• generator

• graphite

• github

• heartbeat

• heroku

• http

• http_poller

• irc

• imap

• jdbc

• jmx

• kafka

• log4j

• lumberjack
• meetup

• pipe

• puppet_facter

• relp

• rss

• rackspace

• rabbitmq
• redis
• salesforce

• snmptrap

• stdin
• sqlite
• s3

• sqs

• stomp

• syslog
• tcp

• twitter
• unix

• udp

• varnishlog

• wmi

• websocket
• xmpp

• zenoss

• zeromq
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
• Output Plugins
• boundary

• circonus

• csv

• cloudwatch

• datadog

• datadog_metric
s

• email

• elasticsearch
• elasticsearch_j
ava

• exec

• file
• google_bigquery

• google_cloud_s
torage

• ganglia
• gelf

• graphtastic

• graphite
• hipchat

• http

• irc

• influxdb

• juggernaut

• jira

• kafka
• lumberjack

• librato

• loggly
• mongodb
• metriccatcher

• nagios
• null

• nagios_nsca

• opentsdb

• pagerduty

• pipe

• riemann

• redmine

• rackspace

• rabbitmq
• redis
• riak

• s3

• sqs

• stomp

• statsd

• solr_http

• sns

• syslog
• stdout
• tcp

• udp

• webhdfs

• websocket
• xmpp

• zabbix

• zeromq
An output plugin sends event data to a particular destination.
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Logstash-forwarder: syslog, auth, ufw and nginx
{

"network": {

"servers": [ "10.0.0.2:5000" ],

"timeout": 15,

"ssl ca": "/etc/pki/tls/certs/logstash-forwarder.crt"

},

"files": [

{

"paths": [

"/var/log/syslog",

"/var/log/auth.log"

],

"fields": { "type": "syslog" }

},

{

"paths": [

"/var/log/ufw.log"

],

"fields": {"type": "firewall"}

},

{

"paths": [

"/var/log/nginx/*.log"

],

"exlude":["*.gz", "err*.log", "*.log.*"],

"fields": { "type": "nginx-api" }

}

]

}
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Logstash Server: syslog, auth, ufw and nginx
input {
lumberjack {
port => 5000
type => "logs"
ssl_certificate => "/etc/pki/tls/certs/logstash-forwarder.crt"
ssl_key => "/etc/pki/tls/private/logstash-forwarder.key"
}
}
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Logstash Server: syslog, auth, ufw and nginx
filter {

if [type] == "syslog" {

grok {

match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %
{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:[%{POSINT:syslog_pid}])?: %
{GREEDYDATA:syslog_message}" }

add_field => [ "received_at", "%{@timestamp}" ]

add_field => [ "received_from", "%{host}" ]

}

syslog_pri { }

date {

match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]

}

}

}
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Logstash-forwarder: syslog, auth, ufw and nginx
output {elasticsearch {
host => "10.0.0.25"
port => "9300"
cluster => "iscpif-es"}
}
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Logstash: Suricata | Open Source IDS / IPS / NSM engine
• Highly Scalable
• Suricata is multi threaded

• Protocol Identification
• Suricata a Malware Command and Control Channel
hunter. 

• Off port HTTP CnC channels, which normally slide
right by most IDS systems 

• Thanks to dedicated keywords you can match on
protocol fields which range from http URI to a SSL
certificate identifier.

• File Identification, MD5 Checksums, and File
Extraction
• Identify thousands of file types while crossing your
network!
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Suricata.yml
- eve-log:
enabled: yes
type: file #file|syslog|unix_dgram|unix_stream
filename: eve.json
types:
- alert
- http:
extended: yes
- dns
- tls:
extended: yes # enable this for extended logging information
- files:
force-magic: yes # force logging magic on all logged files
force-md5: yes # force logging of md5 checksums
- ssh
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Logstash-forwarder: Suricata
{

# The network section covers network configuration :)

"network": {

"servers": [ "10.0.0.2:5000" ],

"timeout": 15,

"ssl ca": "/etc/pki/tls/certs/logstash-forwarder.crt"

},

"files": [{

"paths": ["/var/log/suricata/eve.json"],

"fields": { "type": "suricata" },

"sincedb_path": "/var/logstash/suricata.db",

"sincedb_write_interval": 1,

"codec": "json",

"type":"suricata"

}]

}
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Logstash server: Suricata
filter {

if [type] == "suricata" {

json{

source => "message"

}

if [src_ip] {

geoip {

source => "src_ip"

target => "geoip"

database => "/etc/logstash/GeoLiteCity.dat"

add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ]

add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" ]

}

mutate {

convert => [ "[geoip][coordinates]", "float" ]

remove_field => [ "timestamp" ]

}

}

}

}
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Logstash: Suricata
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Logstash: Suricata
• Logstash daily index
• index template

• easy to retire index

• close/delete

• 22 machines
• only 2 with public IP

• Logs
• between 1-3 millions /day
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Logstash: Suricata
• 73 million docs < 2days!
• during Mongodump

• transferring remotely

• Watch out for Suricata
• Stream events!

• SURICATA STREAM Packet with
invalid ack

• And lots of other stream alerts!

• I disabled it! Maybe I am wrong :)
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Kibana | Explore & Visualize Your Data
• Seamless Integration with Elasticsearch

• Give Shape to Your Data
• Sophisticated Analytics

• Empower More Team Members

• Flexible Interface, Easy to Share

• Easy Setup
• Visualize Data from Many Sources

• Simple Data Export
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Kibana 4.2.1 | Compatible with Elasticsearch 2.x
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
The ELK Stack | Elasticsearch, Logstash and Kibana
Elasticsearch 1.7.3
(2.0.0)
Logstash 1.5.5
(2.0.0)
Kibana 3
(4.2.1)
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
The ELK Stack | Elasticsearch, Logstash and Kibana
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
The ELK Stack | Elasticsearch, Logstash and Kibana
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
The ELK Stack | Elasticsearch, Logstash and Kibana
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
The ELK Stack | Elasticsearch, Logstash and Kibana
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
The ELK Stack | Elasticsearch, Logstash and Kibana
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
The ELK Stack | Elasticsearch, Logstash and Kibana
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
The ELK Stack | Elasticsearch, Logstash and Kibana
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
The ELK Stack | Elasticsearch, Logstash and Kibana
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
The ELK Stack | Elasticsearch, Logstash and Kibana
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Elasticsearch Mapping
• Which string fields should be full text fields.

• Which fields contain numbers, dates, or geolocations.

• The format of date values.

• a simple type like string, date, long, double, boolean or ip.

• a type which supports the hierarchical nature of JSON such as object or nested.

• or a specialized type like geo_point, geo_shape, or completion.

• multi-fields
• a string field could be indexed as an analyzed field for full-text search, and as a not_analyzed
field for sorting or aggregations. 

• Alternatively, you could index a string field with the standard analyzer, the english analyzer, and
the french analyzer.
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Elasticsearch Mapping
• Dynamic mapping
• Fields and mapping types do not need to be defined before being used.

• Explicit mappings
• You can create mapping types and field mappings when you create an index

• Updating existing mappings

• Existing type and field mappings cannot be updated

• Create a new index with the correct mappings and reindex your data
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Mapping Explicit
PUT my_index 

{

"mappings": {

"user": { 

"_all": { "enabled": false }, 

"properties": { 

"title": { "type": "string" }, 

"name": { "type": "string" }, 

"age": { "type": "integer" } 

}

},

"blogpost": { 

"properties": { 

"title": { "type": "string" }, 

"body": { "type": "string" }, 

"user_id": {

"type": "string", 

"index": "not_analyzed"

},

"created": {

"type": "date", 

"format": "yyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"

}

}

}

}

}
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Mapping Dynamic templates
PUT my_index

{

"mappings": {

"my_type": {

"dynamic_templates": [

{

"integers": {

"match_mapping_type": "long",

"mapping": {

"type": "integer"

}

}

},

{

"strings": {

"match_mapping_type": "string",

"mapping": {

"type": "string",

"fields": {

"raw": {

"type": "string",

"index": "not_analyzed",

"ignore_above": 256

}

}

}

}

}

]

}

}

}
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Mapping Dynamic templates
PUT my_index

{

"template": "logs-*",

"settings": {

"index.number_of_replicas": "0",

"index.number_of_shards": "3"

},

"mappings": {

"my_type": {

"dynamic_templates": [

{

"integers": {

"match_mapping_type": "long",

"mapping": {

"type": "integer"

}

}

},

{

"strings": {

"match_mapping_type": "string",

"mapping": {

"type": "string",

"fields": {

"raw": {

"type": "string",

"index": "not_analyzed",

"ignore_above": 256

}

}

}

}

}

]

}

}

}
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Production
• Hardware
• Memory

• 64 GB of RAM is the ideal sweet spot

• 16 GB of RAM for Heap Size and 32 GB total

• And don’t cross 30.5 GB!

• CPU
• 2-8 cores of CPU

• faster CPUs vs. more cores = choose more cores

• Disk
• SSDs (monitor the I/O)

• high-performance server disks (15k RPM)

• RAID 0 (ES is high available by replicas)
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Security
• No built-in authentication

• Do not expose Elasticsearch to the world
• Watch out for Denial of Service

• Do not give users to define index name (like ",*" )

• Turn off Dynamic Scripts (default is off)

• Control protocols (DELETE, PUT, etc.)

• Nginx (reverse proxy, SSL and auth)

• Apache
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Lots of things!
• Analyzer and tokenizer (human language)

• Log slow ops

• Index settings
• refresh interval

• flush interval

• Differentiate your nodes
• Data nodes

• Master nodes

• Client nodes

• Cluster health
• Heap size

• Thread pools

• Merging time

• etc.
“Launch your
GIANT idea”
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB v3.2 is coming soon. Learn more.
• Document Database
• Documents (i.e. objects)

• Embedded documents and arrays (no expensive joins!)

• Dynamic schema supports fluent polymorphism.

• High Availability
• automatic failover

• data redundancy

• Automatic Scaling
• Automatic sharding
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | Key Features
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | Platforms
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | Install MongoDB on Ubuntu
echo "deb http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.0 multiverse" | sudo
tee /etc/apt/sources.list.d/mongodb-org-3.0.list
Import the public key used by the package management system
Create a list file for MongoDB
sudo apt-get update && sudo apt-get install -y mongodb-org

sudo service mongod start
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 7F0CEB10
Install MongoDB 3.0.7
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | SQL to MongoDB
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | SQL to MongoDB
https://docs.mongodb.org/manual/reference/sql-comparison/
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | Insert Documents
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | Query Documents
db.inventory.find( { type: "snacks" } )
db.inventory.find( {} )
db.inventory.find( { type: { $in: [ 'food', 'snacks' ] } } )
db.inventory.find( { type: 'food', price: { $lt: 9.95 } } )
db.inventory.find(
{
$or: [ { qty: { $gt: 100 } }, { price: { $lt: 9.95 } } ]
}
)
All documents
Equality
Query Operation
AND condition
OR condition
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | Query Documents
db.tweets.find(
{
"coordinates.coordinates":
{ $near :
{
$geometry: { type: "Point", coordinates: [ 2.3325923, 48.8537095] },
$minDistance: 1,
$maxDistance: 500
}
}
}
)
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | Aggregation
{

"_id": "10280",

"city": "NEW YORK",

"state": "NY",

"pop": 5574,

"loc": [

-74.016323,

40.710537

]

}
db.zipcodes.aggregate( [

{ $group: { _id: "$state", totalPop: { $sum: "$pop" } } },

{ $match: { totalPop: { $gte: 10*1000*1000 } } }

] )
{

"_id" : "AK",

"totalPop" : 550043

}
SELECT state, SUM(pop) AS totalPop

FROM zipcodes

GROUP BY state

HAVING totalPop >= (10*1000*1000)
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
db.tweets.aggregate([

{

$geoNear: {

near: [ 2.348730, 48.840982 ],

distanceField: "dist.calculated",

maxDistance: 100,

includeLocs: "dist.location",

query: {"coordinates.type": "Point"},

limit: 100

}

},

{ $group: {

_id: "$user.id",

count: { $sum: 1 },

name: { $addToSet: "$user.name" },

date: { $addToSet: "$created_at" },

text: { $addToSet: "$text" },

coordinates: { $addToSet: "$coordinates" }

} },

{$sort: {"count": -1}},

{$limit: 10}

]);
MongoDB | Aggregation
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | MapReduce
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | Index
• Creating an index 

• db.ships.ensureIndex({name : 1})

• Dropping an index 

• db.ships.dropIndex({name : 1})

• Creating a compound index 

• db.ships.ensureIndex({name : 1, operator : 1, class : 0})

• Dropping a compound index 

• db.ships.dropIndex({name : 1, operator : 1, class : 0})
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | Replica Set
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | Replica Set
• Replica Set Members

• Replica Set Primary
• Accepts write operations

• Replica Set Secondary Members
• Replicate the primary’s data set and accept read operations

• Priority 0 Replica Set Members
• Priority 0 members are secondaries that cannot become the primary.

• Hidden Replica Set Members
• Invisible to applications

• Replica Set Arbiter
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | Shardings
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | 3.0
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | 3.0
• 7-10x Better Performance
• Up to 80% Less Storage
• Reduce Operational Overhead By Up to 95%

• Pluggable Storage Optimized For Your Workload

• Low Latency Across the Globe
• Enhancements That Make You More Productive

• Faster Loading and Export
• Easier Query Optimization
• Faster Debugging
• Richer Geospatial Apps
• Better Time-Series Analytics
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | 3.0
Datacenter in France and Italy
60M/week - 8M/day - 360K/hour
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | ISCPIF Use Cases
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | ISCPIF Use Cases
Avg. usage of Twitter in Paris in October
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | ISCPIF Use Cases
24HOURS NEWS: Real-time Breaking News
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | ISCPIF Use Cases
24HOURS NEWS: Real-time Breaking News
• 50-100 Updates /s

• Time-series Queries

• Grid-FS

• FTS (full-text search)

• Tokenizes and stems

• Scoring

• 140 characters/small dataset!

• TTL Index

• Session store
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | ISCPIF Use Cases
TIMOTHY: Real-time Dashboards
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | ISCPIF Use Cases
TIMOTHY: Real-time Dashboards
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | ISCPIF Use Cases
TIMOTHY: Real-time Dashboards
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | ISCPIF Use Cases
HIGH THROUGHPUT: Real-time aggregations, over 50K inserts /s
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | ISCPIF Use Cases
Most generic
Most specific Calculating the new graph
HIGH THROUGHPUT: Real-time aggregations, over 50K inserts /s
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | ISCPIF Use Cases
130K
inserts /s
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | ISCPIF Use Cases
10K-80K
inserts /s
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | ISCPIF Use Cases
9K queries /s
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | ISCPIF Use Cases
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | ISCPIF Use Cases
Gephi Streaming
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | Monitoring System
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | Monitoring System Network and Cache
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | Monitoring System Hardware
MongoDB | Monitoring System Hardware
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | ISCPIF Use Cases
Monitoring NewRelic
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | ISCPIF Use Cases
Monitoring NewRelic
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | Monitoring System Objects
3.02 Billion Documents
64 Collections
195 Indexes
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | Monitoring System Objects
“in-memory data structure store”
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
used as database, cache and message broker
• Data structures
• strings
• hashes
• lists
• sets
• sorted sets
• bitmaps

• hyperlogs
• geospatial indexes

• Built-in
• replication

• Lua scripting

• transactions

• on-disk persistence
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Redis | Key Features
Redis KEYS:
> set mykey somevalue
OK

> get mykey
“somevalue”
Redis LISTS:
> rpush mylist A
(integer) 1

> rpush mylist B
(integer) 2

> lpush mylist first
(integer) 3
> lrange mylist 0 -1
1) "first"
2) "A"
3) "B"
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Redis | Data Structures just a little!
maziyar-beautiful-MacBook$ redis-benchmark -q -n 100000
PING_INLINE: 85178.88 requests per second

PING_BULK: 80000.00 requests per second

SET: 86580.09 requests per second

GET: 83263.95 requests per second

INCR: 83963.05 requests per second

LPUSH: 86880.97 requests per second

LPOP: 90252.70 requests per second

SADD: 84388.19 requests per second

SPOP: 92936.80 requests per second

LPUSH (needed to benchmark LRANGE): 87336.24 requests per second

LRANGE_100 (first 100 elements): 25614.75 requests per second

LRANGE_300 (first 300 elements): 10455.88 requests per second

LRANGE_500 (first 450 elements): 7125.04 requests per second

LRANGE_600 (first 600 elements): 5369.13 requests per second

MSET (10 keys): 50000.00 requests per second
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Redis | How fast is Redis?
maziyar-beautiful-MacBook$ redis-benchmark -n 1000000 -t set,get -P 16 -q
PING_INLINE: 735294.12 requests per second

PING_BULK: 988142.31 requests per second

SET: 681198.88 requests per second

GET: 831255.25 requests per second

INCR: 778210.12 requests per second

LPUSH: 682593.81 requests per second

LPOP: 713775.88 requests per second

SADD: 732600.75 requests per second

SPOP: 885739.62 requests per second

LPUSH (needed to benchmark LRANGE): 656598.81 requests per second
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Redis | How fast is Redis? Pipelining of 16 commands
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Redis | ISCPIF Use Cases
Scientific Games
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Redis | ISCPIF Use Cases
Scientific Games
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Redis | ISCPIF Use Cases
Scientific Games
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Redis | ISCPIF Use Cases
• Scientific Operations
• Occurrence

• Co-Occurrence

• Scientific Games
• Pub/Sub
• Rate Limiter (IP-based with TTL)
• Chat rooms
• TTL
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Redis | ISCPIF Use Cases
Monitoring NewRelic
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Redis | ISCPIF Use Cases
Monitoring NewRelic
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Redis | ISCPIF Use Cases
Monitoring NewRelic
“Messaging that just works”
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
well actually it’s more than that, but OK!
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
RabbitMQ | Feature List
A messaging broker
• Highlights

• Reliability

• Flexible Routing

• Clustering

• Federation

• Highly Available Queues

• Multi-protocol

• Many Clients

• Management UI

• Plugin System

• For what?
• Data delivery

• Non-blocking operations

• Push notifications

• Publish / subscribe

• Asynchronous processing (work queues)
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
RabbitMQ | Feature List
A messaging broker
Type
Topic
Q1
Q2
Q3
climate.*
risk.*
news.*
RabbitMQ Routing
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
RabbitMQ | ISCPIF Use Cases
Monitoring NewRelic
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
RabbitMQ | ISCPIF Use Cases
Monitoring RabbitMQ Management UI
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
RabbitMQ | ISCPIF Use Cases
Monitoring RabbitMQ Management UI
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
RabbitMQ | ISCPIF Use Cases
Monitoring RabbitMQ Management UI
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
RabbitMQ | ISCPIF Use Cases
Monitoring RabbitMQ Management UI
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
RabbitMQ | ISCPIF Use Cases
• Distributed Computations
• Parsing text files

• Scientific calculations

• Realtime Processing

• Text mining

• NLP

• Annotation

• Keyword extractions

• Job Queues
• RPC (Remote procedure call)
• Topic based routing
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
RabbitMQ | ISCPIF Use Cases
• Parsing
• 225 file

• 10m-20m lines

• Avg. total of 3.3 Billions

• RPC
• Post-process each document

• Output 

• MongoDB

• ElasticSearch

• Redis
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
ISCPIF Big Data
• Multivac (Open Data Platform)
• ISCPIF APIs
• Science en Poche
• Climatique (COP21)
• Risk (AXA research fund)
• Scientific Dashboards
• Distributed Computing
• Nobel Game (scientific game)
• Twitter streaming (UN, France, Climate Change, Risk, etc)
• Instagram streaming (Paris)
Twitter
Storing
Data
Real-time
Streaming
System
Data
Analytics
Real-time
Processing
Web
Mobile
Wearable
Devices
Text
Mining
Sensor-based
devices
Mobile
Devices
Wearable
Devices
Instagram
Foursquare
Data Streams Real-time Streaming System
Web Socket
XMLJSON
Authorization
Authentication
Identification
Flash Socket
xhr-polling
jsonp-polling
Backend
Architecture
Facebook
Files
End User
Indexing
Data
RPC
System
NLP
Annotation
Extraction
Streaming
Data
Crowd
Sourcing
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Real-Time Data Stream Processing
HighPerformanceInfrastructure
HighlyAvailableInfrastructure
• Multivac (Open Data)
• ISCPIF APIs
• Scientific Dashboards
• Science en Poche
• Distributed Computing
• Climatique (COP21)
• Risk (AXA research fund)
• Nobel Game (scientific game)
• Twitter streaming (UN,
France, Climate Change, Risk)
• Instagram streaming (Paris)
Python
Scala
Script
Java
Erlang
iOS
Node
JS
current projects
–just a regular Geek :)
“You are your best benchmark!”
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
SHOWCASE
FRANCE 2014
Real-time processing and visualizing Twitter in France
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
FRANCE 2014
Real-time processing and visualizing Twitter in France
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
News Tracking
Real-time tracking news with highest impact of networks
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Aviation Accidents
50K retweets/10min
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Aviation Accidents
120K retweets/10min
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Robin Williams
180K retweets/10min
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
#Ferguson michealBROWN
75K retweets/10min
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Paris
13 Novembre
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
22 VMs
Distributed Systems
320K Ops /S
130K Insert /S
64K Index /S
…
8 WEB SERVERS
4 API SERVERS
Search Engine Cluster
900 million data
45%
Database
2.9 billion data
22%
120 Cores
2.5TB RAM
30 TB SSD
+8000 Lines Code
14 Web Apps
4 Mobile Apps
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
• Starting 2016 with CouchBase in parallel

• Graph Databases

• Spark Streaming / machine learning

• Clustering and categorizing in real-time

• Creative Hardware
• SlipStream, StratusLab and EGI Cloud
• Healthcare and Wearable devices

• Non, no drones! ;-)
What’s next for Big Data at ISCPIF
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
–The Blacklist: Lord Baltimore (No. 104)
“Every piece of information is worth something to somebody. And
in the hands of the wrong person, that could be deadly.”
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Reddington: People love to decry big brother the NSA, the government listening in on
their most private lives, yet they all willingly go online and hand over the most intimate
details of those lives - to big data.
Elizabeth: Most people don't care that Google knows their search history.
Reddington: They know more than that. They know your habits, the banks you use, the
pills you pop, the men or women you sleep with.”
Thanks!
maziyar.panahi@iscpif.fr
25 November 2015
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
http://iscpif.fr/maziyar

More Related Content

What's hot

Sparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with SparkSparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with Sparkfelixcss
 
Scaling terraform
Scaling terraformScaling terraform
Scaling terraformPaolo Tonin
 
A new execution model for Nashorn in Java 9
A new execution model for Nashorn in Java 9A new execution model for Nashorn in Java 9
A new execution model for Nashorn in Java 9Marcus Lagergren
 
OpenStack on the Fabric - OpenStack Korea January Seminar 2014
OpenStack on the Fabric - OpenStack Korea January Seminar 2014OpenStack on the Fabric - OpenStack Korea January Seminar 2014
OpenStack on the Fabric - OpenStack Korea January Seminar 2014Jun Lee
 
Dapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterDapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterHPCC Systems
 

What's hot (6)

Sparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with SparkSparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with Spark
 
Scaling terraform
Scaling terraformScaling terraform
Scaling terraform
 
A new execution model for Nashorn in Java 9
A new execution model for Nashorn in Java 9A new execution model for Nashorn in Java 9
A new execution model for Nashorn in Java 9
 
Dive into PySpark
Dive into PySparkDive into PySpark
Dive into PySpark
 
OpenStack on the Fabric - OpenStack Korea January Seminar 2014
OpenStack on the Fabric - OpenStack Korea January Seminar 2014OpenStack on the Fabric - OpenStack Korea January Seminar 2014
OpenStack on the Fabric - OpenStack Korea January Seminar 2014
 
Dapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL NeaterDapper Tool - A Bundle to Make your ECL Neater
Dapper Tool - A Bundle to Make your ECL Neater
 

Similar to HOW TO SCALE FROM ZERO TO BILLIONS!

Attack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaAttack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaPrajal Kulkarni
 
"Esup CAS Packaging" : Deploy and customize easily a CAS4 server
"Esup CAS Packaging" : Deploy and customize easily a CAS4 server"Esup CAS Packaging" : Deploy and customize easily a CAS4 server
"Esup CAS Packaging" : Deploy and customize easily a CAS4 serverLudovic A
 
Ansible- Durham Meetup: Using Ansible for Cisco ACI deployment
Ansible- Durham Meetup: Using Ansible for Cisco ACI deploymentAnsible- Durham Meetup: Using Ansible for Cisco ACI deployment
Ansible- Durham Meetup: Using Ansible for Cisco ACI deploymentJoel W. King
 
Web Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectWeb Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectSaltlux Inc.
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupRafal Kwasny
 
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014NoSQLmatters
 
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache CassandraCassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache CassandraDataStax Academy
 
Cloud Architect Alliance #15: Openstack
Cloud Architect Alliance #15: OpenstackCloud Architect Alliance #15: Openstack
Cloud Architect Alliance #15: OpenstackMicrosoft
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkRahul Jain
 
Smuggling Multi-Cloud Support into Cloud-native Applications using Elastic Co...
Smuggling Multi-Cloud Support into Cloud-native Applications using Elastic Co...Smuggling Multi-Cloud Support into Cloud-native Applications using Elastic Co...
Smuggling Multi-Cloud Support into Cloud-native Applications using Elastic Co...Nane Kratzke
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09Chris Purrington
 
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platformOCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platformMarc Dutoo
 
Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.Prajal Kulkarni
 
introduction to node.js
introduction to node.jsintroduction to node.js
introduction to node.jsorkaplan
 
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...OCCIware
 
OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...
OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...
OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...OW2
 

Similar to HOW TO SCALE FROM ZERO TO BILLIONS! (20)

RISC V in Spacer
RISC V in SpacerRISC V in Spacer
RISC V in Spacer
 
Attack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaAttack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and Kibana
 
"Esup CAS Packaging" : Deploy and customize easily a CAS4 server
"Esup CAS Packaging" : Deploy and customize easily a CAS4 server"Esup CAS Packaging" : Deploy and customize easily a CAS4 server
"Esup CAS Packaging" : Deploy and customize easily a CAS4 server
 
Ansible- Durham Meetup: Using Ansible for Cisco ACI deployment
Ansible- Durham Meetup: Using Ansible for Cisco ACI deploymentAnsible- Durham Meetup: Using Ansible for Cisco ACI deployment
Ansible- Durham Meetup: Using Ansible for Cisco ACI deployment
 
Web Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectWeb Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC Project
 
ETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetupETL with SPARK - First Spark London meetup
ETL with SPARK - First Spark London meetup
 
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
Johnny Miller – Cassandra + Spark = Awesome- NoSQL matters Barcelona 2014
 
Logstash
LogstashLogstash
Logstash
 
OSPRay 1.0 and Beyond
OSPRay 1.0 and BeyondOSPRay 1.0 and Beyond
OSPRay 1.0 and Beyond
 
HPC in the Cloud
HPC in the CloudHPC in the Cloud
HPC in the Cloud
 
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache CassandraCassandra Day SV 2014: Spark, Shark, and Apache Cassandra
Cassandra Day SV 2014: Spark, Shark, and Apache Cassandra
 
Cloud Architect Alliance #15: Openstack
Cloud Architect Alliance #15: OpenstackCloud Architect Alliance #15: Openstack
Cloud Architect Alliance #15: Openstack
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
Smuggling Multi-Cloud Support into Cloud-native Applications using Elastic Co...
Smuggling Multi-Cloud Support into Cloud-native Applications using Elastic Co...Smuggling Multi-Cloud Support into Cloud-native Applications using Elastic Co...
Smuggling Multi-Cloud Support into Cloud-native Applications using Elastic Co...
 
AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09AWS (Hadoop) Meetup 30.04.09
AWS (Hadoop) Meetup 30.04.09
 
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platformOCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
OCCIware@POSS 2016 - an extensible, standard XaaS cloud consumer platform
 
Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.Null Bachaav - May 07 Attack Monitoring workshop.
Null Bachaav - May 07 Attack Monitoring workshop.
 
introduction to node.js
introduction to node.jsintroduction to node.js
introduction to node.js
 
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
OCCIware, an extensible, standard-based XaaS consumer platform to manage ever...
 
OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...
OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...
OCCIware: Extensible and Standard-based XaaS Platform To Manage Everything in...
 

Recently uploaded

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 

Recently uploaded (20)

WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 

HOW TO SCALE FROM ZERO TO BILLIONS!

  • 1. Maziyar PANAHI Big Data engineer / Cloud Architect ARGOS - NoSQL / Big Data Université Paris-Sud, LAL 25 November 2015 Engineer at CNRS
  • 2. ISCPIF INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE UNITÉ CNRS UPS3611 - HTTP://ISCPIF.FR Creative commons, open science, open data, ressources mutualisées
  • 3. ISCPIF INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE UNITÉ CNRS UPS3611 - HTTP://ISCPIF.FR Creative commons, open science, open data, ressources mutualisées
  • 4. ISCPIF INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE http://iscpif.fr/services/
  • 5. ISCPIF INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE http://iscpif.fr/services/
  • 6. ISCPIF INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE • Core Services • ROOM RESERVATION • EVENT ANNOUNCEMENT • PROJECT HOSTING AND RESIDENCIES • HIGH PERFORMANCE COMPUTING • TRAINING SESSIONS • COMMUNITY EXPLORER • Open Platforms • OpenMole • Gargantext • Big Data • Linkrbrain http://iscpif.fr/services/
  • 7. ISCPIF INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE External Collaborators: ISCPIF Partners:
  • 8. • Elasticsearch • MongoDB • Redis • RabbitMQ • Big Data Platform INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) HOW TO SCALE FROM ZERO TO BILLIONS!
  • 9. –Elasticsearch “You Know, for Search!” INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
  • 10. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch
  • 11. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch | Real-Time Data
  • 12. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch | Real-Time Advanced Analytics
  • 13. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch | Massively Distributed
  • 14. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch | High Availability
  • 15. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch | Multi-tenancy Host Index
  • 16. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch | Full-Text Search
  • 17. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch | Document-Oriented & Schema-Free
  • 18. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch | Developer-Friendly, RESTful API • Single document APIs • Index API • Get API • Delete API • Update API • Multi-document APIs • Multi Get API • Bulk API • Bulk UDP API • Delete By Query API
  • 19. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch | Developer-Friendly, RESTful API Index Type ID Document
  • 20. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch | Search & Analyze Data in Real Time Apache 2 Open Source License Build on top of Apache Lucene™
  • 21. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Installation - Package 1. curl -L -O https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/ tar/elasticsearch/2.0.0/elasticsearch-2.0.0.tar.gz 2. tar -xvf elasticsearch-2.0.0.tar.gz 3. cd elasticsearch-2.0.0/bin 4. ./elasticsearch
  • 22. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Installation - Repositories echo "deb http://packages.elastic.co/elasticsearch/2.x/debian stable main" | sudo tee -a /etc/ apt/sources.list.d/elasticsearch-2.x.list Download and install the Public Signing Key: Repository definition APT -> /etc/apt/sources.list.d/elasticsearch-2.x.list sudo apt-get update && sudo apt-get install elasticsearch wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add - Install Elasticsearch 2.0:
  • 23. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Installation - That’s it! Simply run: curl 'http://localhost:9200/?pretty'
  • 24. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Configuration - System curl localhost:9200/_nodes/process?pretty • #File descriptors • Setting it to 32k or even 64k is recommended • #Memory settings • Disable swap • sudo swapoff -a • /etc/fstab
  • 25. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Configuration - System -> /etc/default/elasticsearch• ES_HEAP_SIZE • Leave enough for the OS • Leave enough for the • Neve ever go over 30.1 GB!! • I’ll go with half < 30 GB
  • 26. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Configuration - Elasticsearch curl localhost:9200/_nodes/process?pretty• #/etc/elasticsearch/elasticsearch.yml • network: • host : <MACHINE IP ADDRESS> • path: • logs: /var/log/elasticsearch • data: /var/data/elasticsearch • cluster: • name: <NAME OF YOUR CLUSTER> • node: • name: <NAME OF YOUR NODE>
  • 27. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Configuration - Elasticsearch Node.name Cluster.name mlockall # Elasticsearch performs poorly when JVM starts swapping: you should ensure that it _never_ swaps.
  • 28. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Configuration - Elasticsearch #Shards #Replicas Remember this? • 1 cluster • 3 nodes • 6 shards • 1 replica
  • 29. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Shards and Replicas Why Shards and Replicas? • ES has built in clustering • Scaling out index: (shards) • Parallel work on an index: (shards) • Increasing availability: (replicas) • Can change number of replicas anytime! • Cannot change number of shards after index creation! (must reindex)
  • 30. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Shards and Replicas What is Shard? • You can't actually split an index! • ES uses Multiple Lucene indexes (AKA SHARDS) • Simply, a shard is a Lucene index! • Over head, query hits all shards for scoring • So they don’t come free!
  • 31. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | How many Shards and Replicas? • Replicas: • More replicas = More availability = Longer indexing! • Shards • How much data? • How many queries? • How complex are those queries? • How much resources each node has? • Number of nodes in your cluster • Don’t know? over allocate few shards. (but not too many, they are not free!)
  • 32. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | How many Shards and Replicas? • Changing number of replicas: easy curl -XPUT 'localhost:9200/my_index/_settings' -d ' { "index" : { "number_of_replicas" : 4 } }' • Changing number of shards: must be re-indexed • For some, not a big deal. • For some, it is a big deal!
  • 33. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | How many Shards and Replicas? • StackOverflow http://stackexchange.com/performance • Scaling out: • More shards than the #nodes • Multiple shards in one node
  • 34. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | How many Shards and Replicas? • 3x nodes - 3x shards - 2x replicas • Failing 2x nodes = cluster’s still healthy • Doubling the storage need • each replica = 1/3 of index size • Storage is cheap, small price to pay for availability
  • 35. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Let’s Use It! • Wikipedia uses Elasticsearch to provide full-text search with highlighted search snippets, and search-as-you-type and did-you-mean suggestions. • The Guardian uses Elasticsearch to combine visitor logs with social -network data to provide real-time feedback to its editors about the public’s response to new articles. • Stack Overflow combines full-text search with geolocation queries and uses more- like-this to find related questions and answers. • GitHub uses Elasticsearch to query 130 billion lines of code! full-text search, structured search, analytics, and all three in combination:
  • 36. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | RESTful API with JSON over HTTP • VERB: GET, POST, PUT, HEAD, or DELETE. • PROTOCOL: http or https • HOST: hostname of any node • PORT: Elasticsearch HTTP service, which defaults to 9200 • PATH: API Endpoint (_count, _cluster/stats, _nodes/stats/jvm, etc.) • QUERY_STRING: any optional query-string parameters for example ?pretty • BODY: A JSON-encoded request body (if the request needs one.) curl -X<VERB> '<PROTOCOL>://<HOST>:<PORT>/<PATH>?<QUERY_STRING>' -d '<BODY>'
  • 37. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | RESTful API with JSON over HTTP curl -XGET 'http://localhost:9200/_count?pretty' -d ' { "query": { "match_all": {} } } ' { "count" : 0, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 } }
  • 38. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Clarification Relational DB Databases Tables Rows Columns Elasticsearch Indices Types Documents Fields • Index (noun) • Traditional relational database. It is the place to store related documents. • Index (verb) • To index a document is to store a document in an index (noun) so that it can be retrieved and queried. (Like INSERT in SQL) • Inverted index • B-tree index in Relational databases add an index = Elasticsearch and Lucene use a structure called an inverted index. Both to improve the speed of data retrieval.
  • 39. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Indexing a document PUT /cnrs/employee/1 { "first_name" : "John", "last_name" : "Smith", "age" : 25, "about" : "I love to go rock climbing", "interests": [ "sports", "music" ] } • cnrs • The index name • employee • The type name • /1 • The ID of this particular employee PUT /cnrs/employee/2 { "first_name" : "Jane", "last_name" : "Smith", "age" : 32, "about" : "I like to collect rock albums", "interests": [ "music" ] }
  • 40. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Retrieving a document GET /cnrs/employee/1 { "_index" : "cnrs", "_type" : "employee", "_id" : "1", "_version" : 1, "found" : true, "_source" : { "first_name" : "John", "last_name" : "Smith", "age" : 25, "about" : "I love to go rock climbing", "interests": [ "sports", "music" ] } }
  • 41. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Deleting a document DELETE /cnrs/employee/1
  • 42. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Search GET /cnrs/employee/_search { "took": 6, "timed_out": false, "_shards": { ... }, "hits": { "total": 2, "max_score": 1, "hits": [ { "_index": "cnrs", "_type": "employee", "_id": "1", "_score": 1, "_source": { "first_name": "John", "last_name": "Smith", "age": 25, "about": "I love to go rock climbing", "interests": [ "sports", "music" ] } }, { "_index": "cnrs", "_type": "employee", "_id": "2", "_score": 1, "_source": { "first_name": "Jane", "last_name": "Smith", "age": 32, "about": "I like to collect rock albums", "interests": [ "music" ] } } ] } } GET /cnrs/employee/_search?q=last_name:Smith { ... "hits": { "total": 2, "max_score": 0.30685282, "hits": [ { ... "_source": { "first_name": "John", "last_name": "Smith", "age": 25, "about": "I love to go rock climbing", "interests": [ "sports", "music" ] } }, { ... "_source": { "first_name": "Jane", "last_name": "Smith", "age": 32, "about": "I like to collect rock albums", "interests": [ "music" ] } } ] } }
  • 43. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Search with Query DSL GET /cnrs/employee/_search { "query" : { "match" : { "last_name" : "Smith" } } } Elasticsearch provides a rich, flexible, query language called the query DSL, which allows us to build much more complicated, robust queries.
  • 44. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Search with Query DSL GET /cnrs/employee/_search { "query" : { "filtered" : { "filter" : { "range" : { "age" : { "gt" : 30 } } }, "query" : { "match" : { "last_name" : "smith" } } } } } { ... "hits": { "total": 1, "max_score": 0.30685282, "hits": [ { ... "_source": { "first_name": "Jane", "last_name": "Smith", "age": 32, "about": "I like to collect rock albums", "interests": [ "music" ] } } ] } }
  • 45. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Full-text Search GET /cnrs/employee/_search { "query" : { "match" : { "about" : "rock climbing" } } } { ... "hits": { "total": 2, "max_score": 0.16273327, "hits": [ { ... "_score": 0.16273327, "_source": { "first_name": "John", "last_name": "Smith", "age": 25, "about": "I love to go rock climbing", "interests": [ "sports", "music" ] } }, { ... "_score": 0.016878016, "_source": { "first_name": "Jane", "last_name": "Smith", "age": 32, "about": "I like to collect rock albums", "interests": [ "music" ] } } ] } }
  • 46. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Phrase Search GET /cnrs/employee/_search { "query" : { "match_phrase" : { "about" : "rock climbing" } } } { ... "hits": { "total": 1, "max_score": 0.23013961, "hits": [ { ... "_score": 0.23013961, "_source": { "first_name": "John", "last_name": "Smith", "age": 25, "about": "I love to go rock climbing", "interests": [ "sports", "music" ] } } ] } }
  • 47. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Highlighting Searches GET /cnrs/employee/_search { "query" : { "match_phrase" : { "about" : "rock climbing" } }, "highlight": { "fields" : { "about" : {} } } } { ... "hits": { "total": 1, "max_score": 0.23013961, "hits": [ { ... "_score": 0.23013961, "_source": { "first_name": "John", "last_name": "Smith", "age": 25, "about": "I love to go rock climbing", "interests": [ "sports", "music" ] }, "highlight": { "about": [ "I love to go <em>rock</em> <em>climbing</ em>" ] } } ] } } The highlighted fragment from the original text
  • 48. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Search with Query DSL • Full text queries • Match Query • Multi Match Query • Common Terms Query • Query String Query • Simple Query String Query • Term level queries • Term Query • Terms Query • Range Query • Exists Query • Missing Query • Prefix Query • Wildcard Query • Regexp Query • Fuzzy Query • Type Query • Ids Query • Compound queries • Constant Score Query • Bool Query • Dis Max Query • Function Score Query • Boosting Query • Indices Query • And Query • Not Query • Or Query • Filtered Query • Limit Query • Joining queries • Nested Query • Has Child Query • Has Parent Query • Geo queries • GeoShape Query • Geo Bounding Box Query • Geo Distance Query • Geo Distance Range Query • Geo Polygon Query • Geohash Cell Query • Specialized queries • More Like This Query • Template Query • Script Query
  • 49. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Search with Query DSL
  • 50. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Aggregations GET /cnrs/employee/_search { "query": { "match": { "last_name": "smith" } }, "aggs": { "all_interests": { "terms": { "field": "interests" } } } } ... "all_interests": { "buckets": [ { "key": "music", "doc_count": 2 }, { "key": "sports", "doc_count": 1 } ] }
  • 51. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Aggregations GET /cnrs/employee/_search { "aggs" : { "all_interests" : { "terms" : { "field" : "interests" }, "aggs" : { "avg_age" : { "avg" : { "field" : "age" } } } } } } ... "all_interests": { "buckets": [ { "key": "music", "doc_count": 2, "avg_age": { "value": 28.5 }, } { "key": "sports", "doc_count": 1, "avg_age": { "value": 25 } } ] }
  • 52. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Aggregations { "aggs" : { “my_ip_ranges" : { "ip_range" : { "field" : "ip", "ranges" : [ { "to" : "10.0.0.5" }, { "from" : "10.0.0.5" } ] } } } } { ... "aggregations": { "my_ip_ranges": { "buckets" : [ { "to": 167772165, "to_as_string": "10.0.0.5", "doc_count": 4 }, { "from": 167772165, "from_as_string": "10.0.0.5", "doc_count": 6 } ] } } }
  • 53. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Aggregations { "aggs" : { "articles_over_time" : { "date_histogram" : { "field" : "date", "interval" : "month" } } } } { "aggs" : { "articles_over_time" : { "date_histogram" : { "field" : "date", "interval" : "1.5h" } } } }
  • 54. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Aggregations • Metrics Aggregations • Avg Aggregation • Cardinality Aggregation • Extended Stats Aggregation • Geo Bounds Aggregation • Max Aggregation • Min Aggregation • Percentiles Aggregation • Percentile Ranks Aggregation • Scripted Metric Aggregation • Stats Aggregation • Sum Aggregation • Top hits Aggregation • Value Count Aggregation • Bucket Aggregations • Children Aggregation • Date Histogram Aggregation • Date Range Aggregation • Filter Aggregation • Geo Distance Aggregation • GeoHash grid Aggregation • Histogram Aggregation • IPv4 Range Aggregation • Missing Aggregation • Nested Aggregation • Range Aggregation • Reverse nested Aggregation • Sampler Aggregation • Significant Terms Aggregation • Terms Aggregation
  • 55. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Plugins • Plugins Types • Java plugins • JAR files • Must be installed on all nodes in the cluster • Each node must be restarted • Site plugins • Web content: JS, HTML, CSS etc. • Can be only on one node • Do not require a restart • Mixed plugins • Both JAR files and web content to enhance the core Elasticsearch functionality
  • 56. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Plugins • API extension Plugins • Alerting Plugins • Analysis Plugins • Discovery Plugins • Management and Site Plugins • Mapper Plugins • Scripting Plugins • Security Plugins • Snapshot/Restore Plugins • Transport Plugins • Integrations to enhance the core Elasticsearch functionality
  • 57. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Plugins - kopf Web administration tool for Elasticsearch cluster sudo [/usr/share/elasticsearch/]bin/plugin install [plugin_name] sudo [/usr/share/elasticsearch/]bin/plugin install lmenezes/elasticsearch-kopf sudo [/usr/share/elasticsearch/]bin/plugin install lmenezes/elasticsearch-kopf/2.x open http://localhost:9200/_plugin/kopf
  • 58. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Plugins - kopf Web administration tool for Elasticsearch cluster
  • 59. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Plugins - kopf Web administration tool for Elasticsearch cluster
  • 60. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Plugins - kopf Web administration tool for Elasticsearch cluster
  • 61. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Plugins - kopf Web administration tool for Elasticsearch cluster
  • 62. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Plugins - kopf Web administration tool for Elasticsearch cluster
  • 63. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Plugins - kopf Web administration tool for Elasticsearch cluster index 5k/s larger document (full tweets) index 23k/s smaller document (tweet date, text, etc.) index 67k/s just 140 characters and ID
  • 64. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Plugins - kopf Web administration tool for Elasticsearch cluster
  • 65. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Plugins - kopf Web administration tool for Elasticsearch cluster sudo service elasticsearch stop
  • 66. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Plugins - kopf Web administration tool for Elasticsearch cluster sudo service elasticsearch start
  • 67. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Plugins - kopf Web administration tool for Elasticsearch cluster sudo service elasticsearch start
  • 68. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Plugins - kopf Web administration tool for Elasticsearch cluster
  • 69. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases Full-text search and Data Analytics • Web of Science • Text Mining and NLP • Keyword Extractions • Phrase Occurrence • Phrase Co-Occurrence • Keyword Analytics • Date Histogram • Significant Terms • N-Grams • etc.
  • 70. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases Full-text search and Data Analytics
  • 71. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases Full-text search and Data Analytics
  • 72. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases Full-text search and Data Analytics
  • 73. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases Full-text search and Data Analytics
  • 74. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases Full-text search and Data Analytics
  • 75. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases Full-text search and Data Analytics • Boolean • Query String • Aggregation • Date histogram • Query cache • nop!
  • 76. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases Full-text search and Data Analytics { "bool": { "must": { "match": { "title": "how to make millions" }}, "must_not": { "match": { "tag": "spam" }}, "should": [ { "match": { "tag": "starred" }}, { "range": { "date": { "gte": "2014-01-01" }}} ] } } • title field matches “how to make millions” • not marked as spam • documents are starred or are from 2014 onward will rank higher • Documents that match both conditions will rank even higher
  • 77. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases Full-text search and Data Analytics • Filters • for binary yes/no searches • for queries on exact values • Exists • just the ones with abstract != null • Query cache • true!
  • 78. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases Full-text search and Data Analytics My bool vs. filter = ~500ms vs. ~50ms - ~120ms
  • 79. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases Full-text search and Data Analytics
  • 80. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases Full-text search and Data Analytics • Query DSL: Filters • And Filter • Bool Filter • Exists Filter • Geo Bounding Box Filter • Geo Distance Filter • Geo Distance Range Filter • Geo Polygon Filter • GeoShape Filter • Geohash Cell Filter • Has Child Filter • Has Parent Filter • Ids Filter • Indices Filter • Limit Filter • Match All Filter • Missing Filter • Nested Filter • Not Filter • Or Filter • Prefix Filter • Query Filter • Range Filter • Regexp Filter • Script Filter • Term Filter • Terms Filter • Type Filter
  • 81. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases Full-text search and Data Analytics
  • 82. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases Full-text search and Data Analytics
  • 83. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases Full-text search and Data Analytics • Significant Terms Aggregation • JLH score • mutual information • Chi square • google normalized distance • Percentage • scripted [Yang and Pedersen, "A Comparative Study on Feature Selection in Text Categorization", 1997] (http://courses.ischool.berkeley.edu/i256/f06/papers/yang97comparative.pdf) for a study on using significant terms for feature selection for text classification). "script_heuristic": { "script": "_subset_freq/(_superset_freq - _subset_freq + 1)" }
  • 84. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases Full-text search and Data Analytics
  • 85. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases The Elastic Platform | Make Sense of Your Data
  • 86. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases The Elastic Platform | Make Sense of Your Data
  • 87. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases The Elastic Platform | Make Sense of Your Data
  • 88. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases Logs! • Check Nginx access logs between 2015-11-23T10:23:10 and 2015-11-24T21:53:30 • Check Suricata alert >= 2 between 2015-11-23T10:23:10 and 2015-11-24T21:53:30 and with type of DNS • Now correlate the results!!!
  • 89. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases Logs!
  • 90. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases Logs! • Old School tools • grep/sed/awk/cut/sort • Manually analyze the output • Different formats • Customized fields and details • Not centralized • Modern way (the right way!) • Define endpoints (input/output) • Correlate patterns • Store data (searchable and visualizable)
  • 91. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases Logs! • Symantec Security Information Manager • Splunk • HP / Arcsight • Tripwire • NetIQ • Quest Software • IMB/Q1 Labs • Novell • graylog • fluentd
  • 92. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases Logstash | Collect, Enrich & Transport Data • Process Any Data, From Any Source • Centralize data processing of all types • Normalize varying schema and formats • Quickly extend to custom log formats • Easily add plugins for custom data sources • https://github.com/elastic/logstash
  • 93. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases Logstash | Collect, Enrich & Transport Data input { stdin { } } filter { grok { match => { "message" => "%{COMBINEDAPACHELOG}" } } date { match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ] } } output { elasticsearch { hosts => ["localhost:9200"] } stdout { codec => rubydebug } }
  • 94. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases Logstash | Collect, Enrich & Transport Data 127.0.0.1 - - [11/Dec/2013:00:01:45 -0800] "GET /xampp/status.php HTTP/1.1" 200 3891 "http://cadenza/ xampp/navi.php" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0 { "message" => "127.0.0.1 - - [11/Dec/2013:00:01:45 -0800] "GET /xampp/status.php HTTP/1.1" 200 3891 "http://cadenza/ xampp/navi.php" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0"", "@timestamp" => "2013-12-11T08:01:45.000Z", "@version" => "1", "host" => "cadenza", "clientip" => "127.0.0.1", "ident" => "-", "auth" => "-", "timestamp" => "11/Dec/2013:00:01:45 -0800", "verb" => "GET", "request" => "/xampp/status.php", "httpversion" => "1.1", "response" => "200", "bytes" => "3891", "referrer" => ""http://cadenza/xampp/navi.php"", "agent" => ""Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0"" }
  • 95. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases An input plugin enables a specific source of events to be read by Logstash. • Input Plugins • beats • couchdb_changes • drupal_dblog • elasticsearch • exec • eventlog • file • ganglia • gelf • generator • graphite • github • heartbeat • heroku • http • http_poller • irc • imap • jdbc • jmx • kafka • log4j • lumberjack • meetup • pipe • puppet_facter • relp • rss • rackspace • rabbitmq • redis • salesforce • snmptrap • stdin • sqlite • s3 • sqs • stomp • syslog • tcp • twitter • unix • udp • varnishlog • wmi • websocket • xmpp • zenoss • zeromq
  • 96. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases • Output Plugins • boundary • circonus • csv • cloudwatch • datadog • datadog_metric s • email • elasticsearch • elasticsearch_j ava • exec • file • google_bigquery • google_cloud_s torage • ganglia • gelf • graphtastic • graphite • hipchat • http • irc • influxdb • juggernaut • jira • kafka • lumberjack • librato • loggly • mongodb • metriccatcher • nagios • null • nagios_nsca • opentsdb • pagerduty • pipe • riemann • redmine • rackspace • rabbitmq • redis • riak • s3 • sqs • stomp • statsd • solr_http • sns • syslog • stdout • tcp • udp • webhdfs • websocket • xmpp • zabbix • zeromq An output plugin sends event data to a particular destination.
  • 97. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases Logstash-forwarder: syslog, auth, ufw and nginx { "network": { "servers": [ "10.0.0.2:5000" ], "timeout": 15, "ssl ca": "/etc/pki/tls/certs/logstash-forwarder.crt" }, "files": [ { "paths": [ "/var/log/syslog", "/var/log/auth.log" ], "fields": { "type": "syslog" } }, { "paths": [ "/var/log/ufw.log" ], "fields": {"type": "firewall"} }, { "paths": [ "/var/log/nginx/*.log" ], "exlude":["*.gz", "err*.log", "*.log.*"], "fields": { "type": "nginx-api" } } ] }
  • 98. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases Logstash Server: syslog, auth, ufw and nginx input { lumberjack { port => 5000 type => "logs" ssl_certificate => "/etc/pki/tls/certs/logstash-forwarder.crt" ssl_key => "/etc/pki/tls/private/logstash-forwarder.key" } }
  • 99. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases Logstash Server: syslog, auth, ufw and nginx filter { if [type] == "syslog" { grok { match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} % {SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:[%{POSINT:syslog_pid}])?: % {GREEDYDATA:syslog_message}" } add_field => [ "received_at", "%{@timestamp}" ] add_field => [ "received_from", "%{host}" ] } syslog_pri { } date { match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ] } } }
  • 100. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases Logstash-forwarder: syslog, auth, ufw and nginx output {elasticsearch { host => "10.0.0.25" port => "9300" cluster => "iscpif-es"} }
  • 101. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases Logstash: Suricata | Open Source IDS / IPS / NSM engine • Highly Scalable • Suricata is multi threaded • Protocol Identification • Suricata a Malware Command and Control Channel hunter. • Off port HTTP CnC channels, which normally slide right by most IDS systems • Thanks to dedicated keywords you can match on protocol fields which range from http URI to a SSL certificate identifier. • File Identification, MD5 Checksums, and File Extraction • Identify thousands of file types while crossing your network!
  • 102. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases Suricata.yml - eve-log: enabled: yes type: file #file|syslog|unix_dgram|unix_stream filename: eve.json types: - alert - http: extended: yes - dns - tls: extended: yes # enable this for extended logging information - files: force-magic: yes # force logging magic on all logged files force-md5: yes # force logging of md5 checksums - ssh
  • 103. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases Logstash-forwarder: Suricata { # The network section covers network configuration :) "network": { "servers": [ "10.0.0.2:5000" ], "timeout": 15, "ssl ca": "/etc/pki/tls/certs/logstash-forwarder.crt" }, "files": [{ "paths": ["/var/log/suricata/eve.json"], "fields": { "type": "suricata" }, "sincedb_path": "/var/logstash/suricata.db", "sincedb_write_interval": 1, "codec": "json", "type":"suricata" }] }
  • 104. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases Logstash server: Suricata filter { if [type] == "suricata" { json{ source => "message" } if [src_ip] { geoip { source => "src_ip" target => "geoip" database => "/etc/logstash/GeoLiteCity.dat" add_field => [ "[geoip][coordinates]", "%{[geoip][longitude]}" ] add_field => [ "[geoip][coordinates]", "%{[geoip][latitude]}" ] } mutate { convert => [ "[geoip][coordinates]", "float" ] remove_field => [ "timestamp" ] } } } }
  • 105. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases Logstash: Suricata
  • 106. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases Logstash: Suricata • Logstash daily index • index template • easy to retire index • close/delete • 22 machines • only 2 with public IP • Logs • between 1-3 millions /day
  • 107. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases Logstash: Suricata • 73 million docs < 2days! • during Mongodump • transferring remotely • Watch out for Suricata • Stream events! • SURICATA STREAM Packet with invalid ack • And lots of other stream alerts! • I disabled it! Maybe I am wrong :)
  • 108. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases Kibana | Explore & Visualize Your Data • Seamless Integration with Elasticsearch • Give Shape to Your Data • Sophisticated Analytics • Empower More Team Members • Flexible Interface, Easy to Share • Easy Setup • Visualize Data from Many Sources • Simple Data Export
  • 109. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases Kibana 4.2.1 | Compatible with Elasticsearch 2.x
  • 110. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases The ELK Stack | Elasticsearch, Logstash and Kibana Elasticsearch 1.7.3 (2.0.0) Logstash 1.5.5 (2.0.0) Kibana 3 (4.2.1)
  • 111. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases The ELK Stack | Elasticsearch, Logstash and Kibana
  • 112. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases The ELK Stack | Elasticsearch, Logstash and Kibana
  • 113. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases The ELK Stack | Elasticsearch, Logstash and Kibana
  • 114. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases The ELK Stack | Elasticsearch, Logstash and Kibana
  • 115. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases The ELK Stack | Elasticsearch, Logstash and Kibana
  • 116. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases The ELK Stack | Elasticsearch, Logstash and Kibana
  • 117. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases The ELK Stack | Elasticsearch, Logstash and Kibana
  • 118. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases The ELK Stack | Elasticsearch, Logstash and Kibana
  • 119. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | ISCPIF Use Cases The ELK Stack | Elasticsearch, Logstash and Kibana
  • 120. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Elasticsearch Mapping • Which string fields should be full text fields. • Which fields contain numbers, dates, or geolocations. • The format of date values. • a simple type like string, date, long, double, boolean or ip. • a type which supports the hierarchical nature of JSON such as object or nested. • or a specialized type like geo_point, geo_shape, or completion. • multi-fields • a string field could be indexed as an analyzed field for full-text search, and as a not_analyzed field for sorting or aggregations. • Alternatively, you could index a string field with the standard analyzer, the english analyzer, and the french analyzer.
  • 121. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Elasticsearch Mapping • Dynamic mapping • Fields and mapping types do not need to be defined before being used. • Explicit mappings • You can create mapping types and field mappings when you create an index • Updating existing mappings • Existing type and field mappings cannot be updated • Create a new index with the correct mappings and reindex your data
  • 122. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Mapping Explicit PUT my_index { "mappings": { "user": { "_all": { "enabled": false }, "properties": { "title": { "type": "string" }, "name": { "type": "string" }, "age": { "type": "integer" } } }, "blogpost": { "properties": { "title": { "type": "string" }, "body": { "type": "string" }, "user_id": { "type": "string", "index": "not_analyzed" }, "created": { "type": "date", "format": "yyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis" } } } } }
  • 123. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Mapping Dynamic templates PUT my_index { "mappings": { "my_type": { "dynamic_templates": [ { "integers": { "match_mapping_type": "long", "mapping": { "type": "integer" } } }, { "strings": { "match_mapping_type": "string", "mapping": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed", "ignore_above": 256 } } } } } ] } } }
  • 124. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Mapping Dynamic templates PUT my_index { "template": "logs-*", "settings": { "index.number_of_replicas": "0", "index.number_of_shards": "3" }, "mappings": { "my_type": { "dynamic_templates": [ { "integers": { "match_mapping_type": "long", "mapping": { "type": "integer" } } }, { "strings": { "match_mapping_type": "string", "mapping": { "type": "string", "fields": { "raw": { "type": "string", "index": "not_analyzed", "ignore_above": 256 } } } } } ] } } }
  • 125. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Production • Hardware • Memory • 64 GB of RAM is the ideal sweet spot • 16 GB of RAM for Heap Size and 32 GB total • And don’t cross 30.5 GB! • CPU • 2-8 cores of CPU • faster CPUs vs. more cores = choose more cores • Disk • SSDs (monitor the I/O) • high-performance server disks (15k RPM) • RAID 0 (ES is high available by replicas)
  • 126. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Security • No built-in authentication • Do not expose Elasticsearch to the world • Watch out for Denial of Service • Do not give users to define index name (like ",*" ) • Turn off Dynamic Scripts (default is off) • Control protocols (DELETE, PUT, etc.) • Nginx (reverse proxy, SSL and auth) • Apache
  • 127. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Elasticsearch 2.0 | Lots of things! • Analyzer and tokenizer (human language) • Log slow ops • Index settings • refresh interval • flush interval • Differentiate your nodes • Data nodes • Master nodes • Client nodes • Cluster health • Heap size • Thread pools • Merging time • etc.
  • 128. “Launch your GIANT idea” INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB v3.2 is coming soon. Learn more.
  • 129. • Document Database • Documents (i.e. objects) • Embedded documents and arrays (no expensive joins!) • Dynamic schema supports fluent polymorphism. • High Availability • automatic failover • data redundancy • Automatic Scaling • Automatic sharding INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB | Key Features
  • 130. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB | Platforms
  • 131. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB | Install MongoDB on Ubuntu echo "deb http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.0 multiverse" | sudo tee /etc/apt/sources.list.d/mongodb-org-3.0.list Import the public key used by the package management system Create a list file for MongoDB sudo apt-get update && sudo apt-get install -y mongodb-org sudo service mongod start sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 7F0CEB10 Install MongoDB 3.0.7
  • 132. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB | SQL to MongoDB
  • 133. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB | SQL to MongoDB https://docs.mongodb.org/manual/reference/sql-comparison/
  • 134. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB | Insert Documents
  • 135. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB | Query Documents db.inventory.find( { type: "snacks" } ) db.inventory.find( {} ) db.inventory.find( { type: { $in: [ 'food', 'snacks' ] } } ) db.inventory.find( { type: 'food', price: { $lt: 9.95 } } ) db.inventory.find( { $or: [ { qty: { $gt: 100 } }, { price: { $lt: 9.95 } } ] } ) All documents Equality Query Operation AND condition OR condition
  • 136. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB | Query Documents db.tweets.find( { "coordinates.coordinates": { $near : { $geometry: { type: "Point", coordinates: [ 2.3325923, 48.8537095] }, $minDistance: 1, $maxDistance: 500 } } } )
  • 137. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB | Aggregation { "_id": "10280", "city": "NEW YORK", "state": "NY", "pop": 5574, "loc": [ -74.016323, 40.710537 ] } db.zipcodes.aggregate( [ { $group: { _id: "$state", totalPop: { $sum: "$pop" } } }, { $match: { totalPop: { $gte: 10*1000*1000 } } } ] ) { "_id" : "AK", "totalPop" : 550043 } SELECT state, SUM(pop) AS totalPop FROM zipcodes GROUP BY state HAVING totalPop >= (10*1000*1000)
  • 138. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) db.tweets.aggregate([ { $geoNear: { near: [ 2.348730, 48.840982 ], distanceField: "dist.calculated", maxDistance: 100, includeLocs: "dist.location", query: {"coordinates.type": "Point"}, limit: 100 } }, { $group: { _id: "$user.id", count: { $sum: 1 }, name: { $addToSet: "$user.name" }, date: { $addToSet: "$created_at" }, text: { $addToSet: "$text" }, coordinates: { $addToSet: "$coordinates" } } }, {$sort: {"count": -1}}, {$limit: 10} ]); MongoDB | Aggregation
  • 139. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB | MapReduce
  • 140. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB | Index • Creating an index • db.ships.ensureIndex({name : 1}) • Dropping an index • db.ships.dropIndex({name : 1}) • Creating a compound index • db.ships.ensureIndex({name : 1, operator : 1, class : 0}) • Dropping a compound index • db.ships.dropIndex({name : 1, operator : 1, class : 0})
  • 141. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB | Replica Set
  • 142. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB | Replica Set • Replica Set Members • Replica Set Primary • Accepts write operations • Replica Set Secondary Members • Replicate the primary’s data set and accept read operations • Priority 0 Replica Set Members • Priority 0 members are secondaries that cannot become the primary. • Hidden Replica Set Members • Invisible to applications • Replica Set Arbiter
  • 143. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB | Shardings
  • 144. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB | 3.0
  • 145. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB | 3.0 • 7-10x Better Performance • Up to 80% Less Storage • Reduce Operational Overhead By Up to 95% • Pluggable Storage Optimized For Your Workload • Low Latency Across the Globe • Enhancements That Make You More Productive • Faster Loading and Export • Easier Query Optimization • Faster Debugging • Richer Geospatial Apps • Better Time-Series Analytics
  • 146. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB | 3.0
  • 147. Datacenter in France and Italy 60M/week - 8M/day - 360K/hour INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB | ISCPIF Use Cases
  • 148. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB | ISCPIF Use Cases Avg. usage of Twitter in Paris in October
  • 149. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB | ISCPIF Use Cases 24HOURS NEWS: Real-time Breaking News
  • 150. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB | ISCPIF Use Cases 24HOURS NEWS: Real-time Breaking News • 50-100 Updates /s • Time-series Queries • Grid-FS • FTS (full-text search) • Tokenizes and stems • Scoring • 140 characters/small dataset! • TTL Index • Session store
  • 151. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB | ISCPIF Use Cases TIMOTHY: Real-time Dashboards
  • 152. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB | ISCPIF Use Cases TIMOTHY: Real-time Dashboards
  • 153. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB | ISCPIF Use Cases TIMOTHY: Real-time Dashboards
  • 154. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB | ISCPIF Use Cases HIGH THROUGHPUT: Real-time aggregations, over 50K inserts /s
  • 155. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB | ISCPIF Use Cases Most generic Most specific Calculating the new graph HIGH THROUGHPUT: Real-time aggregations, over 50K inserts /s
  • 156. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB | ISCPIF Use Cases 130K inserts /s
  • 157. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB | ISCPIF Use Cases 10K-80K inserts /s
  • 158. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB | ISCPIF Use Cases 9K queries /s
  • 159. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB | ISCPIF Use Cases
  • 160. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB | ISCPIF Use Cases Gephi Streaming
  • 161. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB | Monitoring System
  • 162. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB | Monitoring System Network and Cache
  • 163. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB | Monitoring System Hardware
  • 164. MongoDB | Monitoring System Hardware INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
  • 165. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB | ISCPIF Use Cases Monitoring NewRelic
  • 166. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB | ISCPIF Use Cases Monitoring NewRelic
  • 167. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB | Monitoring System Objects 3.02 Billion Documents 64 Collections 195 Indexes
  • 168. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) MongoDB | Monitoring System Objects
  • 169. “in-memory data structure store” INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) used as database, cache and message broker
  • 170. • Data structures • strings • hashes • lists • sets • sorted sets • bitmaps • hyperlogs • geospatial indexes • Built-in • replication • Lua scripting • transactions • on-disk persistence INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Redis | Key Features
  • 171. Redis KEYS: > set mykey somevalue OK > get mykey “somevalue” Redis LISTS: > rpush mylist A (integer) 1 > rpush mylist B (integer) 2 > lpush mylist first (integer) 3 > lrange mylist 0 -1 1) "first" 2) "A" 3) "B" INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Redis | Data Structures just a little!
  • 172. maziyar-beautiful-MacBook$ redis-benchmark -q -n 100000 PING_INLINE: 85178.88 requests per second PING_BULK: 80000.00 requests per second SET: 86580.09 requests per second GET: 83263.95 requests per second INCR: 83963.05 requests per second LPUSH: 86880.97 requests per second LPOP: 90252.70 requests per second SADD: 84388.19 requests per second SPOP: 92936.80 requests per second LPUSH (needed to benchmark LRANGE): 87336.24 requests per second LRANGE_100 (first 100 elements): 25614.75 requests per second LRANGE_300 (first 300 elements): 10455.88 requests per second LRANGE_500 (first 450 elements): 7125.04 requests per second LRANGE_600 (first 600 elements): 5369.13 requests per second MSET (10 keys): 50000.00 requests per second INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Redis | How fast is Redis?
  • 173. maziyar-beautiful-MacBook$ redis-benchmark -n 1000000 -t set,get -P 16 -q PING_INLINE: 735294.12 requests per second PING_BULK: 988142.31 requests per second SET: 681198.88 requests per second GET: 831255.25 requests per second INCR: 778210.12 requests per second LPUSH: 682593.81 requests per second LPOP: 713775.88 requests per second SADD: 732600.75 requests per second SPOP: 885739.62 requests per second LPUSH (needed to benchmark LRANGE): 656598.81 requests per second INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Redis | How fast is Redis? Pipelining of 16 commands
  • 174. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Redis | ISCPIF Use Cases Scientific Games
  • 175. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Redis | ISCPIF Use Cases Scientific Games
  • 176. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Redis | ISCPIF Use Cases Scientific Games
  • 177. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Redis | ISCPIF Use Cases • Scientific Operations • Occurrence • Co-Occurrence • Scientific Games • Pub/Sub • Rate Limiter (IP-based with TTL) • Chat rooms • TTL
  • 178. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Redis | ISCPIF Use Cases Monitoring NewRelic
  • 179. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Redis | ISCPIF Use Cases Monitoring NewRelic
  • 180. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Redis | ISCPIF Use Cases Monitoring NewRelic
  • 181. “Messaging that just works” INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) well actually it’s more than that, but OK!
  • 182. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) RabbitMQ | Feature List A messaging broker • Highlights • Reliability • Flexible Routing • Clustering • Federation • Highly Available Queues • Multi-protocol • Many Clients • Management UI • Plugin System • For what? • Data delivery • Non-blocking operations • Push notifications • Publish / subscribe • Asynchronous processing (work queues)
  • 183. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) RabbitMQ | Feature List A messaging broker Type Topic Q1 Q2 Q3 climate.* risk.* news.* RabbitMQ Routing
  • 184. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) RabbitMQ | ISCPIF Use Cases Monitoring NewRelic
  • 185. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) RabbitMQ | ISCPIF Use Cases Monitoring RabbitMQ Management UI
  • 186. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) RabbitMQ | ISCPIF Use Cases Monitoring RabbitMQ Management UI
  • 187. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) RabbitMQ | ISCPIF Use Cases Monitoring RabbitMQ Management UI
  • 188. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) RabbitMQ | ISCPIF Use Cases Monitoring RabbitMQ Management UI
  • 189. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) RabbitMQ | ISCPIF Use Cases • Distributed Computations • Parsing text files • Scientific calculations • Realtime Processing • Text mining • NLP • Annotation • Keyword extractions • Job Queues • RPC (Remote procedure call) • Topic based routing
  • 190. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) RabbitMQ | ISCPIF Use Cases • Parsing • 225 file • 10m-20m lines • Avg. total of 3.3 Billions • RPC • Post-process each document • Output • MongoDB • ElasticSearch • Redis
  • 191. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) ISCPIF Big Data • Multivac (Open Data Platform) • ISCPIF APIs • Science en Poche • Climatique (COP21) • Risk (AXA research fund) • Scientific Dashboards • Distributed Computing • Nobel Game (scientific game) • Twitter streaming (UN, France, Climate Change, Risk, etc) • Instagram streaming (Paris)
  • 192. Twitter Storing Data Real-time Streaming System Data Analytics Real-time Processing Web Mobile Wearable Devices Text Mining Sensor-based devices Mobile Devices Wearable Devices Instagram Foursquare Data Streams Real-time Streaming System Web Socket XMLJSON Authorization Authentication Identification Flash Socket xhr-polling jsonp-polling Backend Architecture Facebook Files End User Indexing Data RPC System NLP Annotation Extraction Streaming Data Crowd Sourcing INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Real-Time Data Stream Processing
  • 195. • Multivac (Open Data) • ISCPIF APIs • Scientific Dashboards • Science en Poche • Distributed Computing • Climatique (COP21) • Risk (AXA research fund) • Nobel Game (scientific game) • Twitter streaming (UN, France, Climate Change, Risk) • Instagram streaming (Paris) Python Scala Script Java Erlang iOS Node JS current projects
  • 196.
  • 197. –just a regular Geek :) “You are your best benchmark!” INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
  • 198. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) SHOWCASE
  • 199. FRANCE 2014 Real-time processing and visualizing Twitter in France INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
  • 200. FRANCE 2014 Real-time processing and visualizing Twitter in France INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
  • 201. News Tracking Real-time tracking news with highest impact of networks INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
  • 202. Aviation Accidents 50K retweets/10min INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
  • 203.
  • 204. Aviation Accidents 120K retweets/10min INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
  • 205. Robin Williams 180K retweets/10min INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
  • 206. #Ferguson michealBROWN 75K retweets/10min INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
  • 207. Paris 13 Novembre INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
  • 208. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
  • 209. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
  • 210. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
  • 211. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
  • 212. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
  • 213. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
  • 214. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
  • 215. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
  • 216. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
  • 217. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
  • 218. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
  • 219. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
  • 220.
  • 221. 22 VMs Distributed Systems 320K Ops /S 130K Insert /S 64K Index /S … 8 WEB SERVERS 4 API SERVERS Search Engine Cluster 900 million data 45% Database 2.9 billion data 22% 120 Cores 2.5TB RAM 30 TB SSD +8000 Lines Code 14 Web Apps 4 Mobile Apps INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
  • 222. • Starting 2016 with CouchBase in parallel • Graph Databases • Spark Streaming / machine learning • Clustering and categorizing in real-time • Creative Hardware • SlipStream, StratusLab and EGI Cloud • Healthcare and Wearable devices • Non, no drones! ;-) What’s next for Big Data at ISCPIF INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
  • 223. –The Blacklist: Lord Baltimore (No. 104) “Every piece of information is worth something to somebody. And in the hands of the wrong person, that could be deadly.” INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) Reddington: People love to decry big brother the NSA, the government listening in on their most private lives, yet they all willingly go online and hand over the most intimate details of those lives - to big data. Elizabeth: Most people don't care that Google knows their search history. Reddington: They know more than that. They know your habits, the banks you use, the pills you pop, the men or women you sleep with.”
  • 224. Thanks! maziyar.panahi@iscpif.fr 25 November 2015 INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF) http://iscpif.fr/maziyar