How Big Data platform scaled from zero to billions of data within 6 months at ISCPIF (CNRS).
This talk contains our use of Elasticsearch, MongoDB, Redis, RabbitMQ and scalable/high available Web services built over Big Data architecture.
This presentation was presented at Université Paris-Sud, LAL, Bâtiment 200 organized by ARGOS. https://indico.mathrice.fr/event/2/overview
ISCPIF: http://iscpif.fr
Big Data at ISCPIF: http://bigdata.iscpif.fr
Climate at ISCPIF: http://climate.iscpif.fr
Playground for climate: http://climate.iscpif.fr/playground
Tweetoscope: http://tweetoscope.iscpif.fr
1. Maziyar PANAHI
Big Data engineer / Cloud Architect
ARGOS - NoSQL / Big Data
Université Paris-Sud, LAL
25 November 2015
Engineer at CNRS
2. ISCPIF
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE
UNITÉ CNRS UPS3611 - HTTP://ISCPIF.FR Creative commons, open science, open data, ressources mutualisées
3. ISCPIF
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE
UNITÉ CNRS UPS3611 - HTTP://ISCPIF.FR Creative commons, open science, open data, ressources mutualisées
6. ISCPIF
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE
• Core Services
• ROOM RESERVATION
• EVENT ANNOUNCEMENT
• PROJECT HOSTING AND RESIDENCIES
• HIGH PERFORMANCE COMPUTING
• TRAINING SESSIONS
• COMMUNITY EXPLORER
• Open Platforms
• OpenMole
• Gargantext
• Big Data
• Linkrbrain
http://iscpif.fr/services/
8. • Elasticsearch
• MongoDB
• Redis
• RabbitMQ
• Big Data Platform
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
HOW TO SCALE FROM ZERO TO BILLIONS!
11. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch | Real-Time Data
12. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch | Real-Time Advanced Analytics
13. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch | Massively Distributed
14. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch | High Availability
15. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch | Multi-tenancy
Host Index
16. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch | Full-Text Search
17. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch | Document-Oriented & Schema-Free
18. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch | Developer-Friendly, RESTful API
• Single document APIs
• Index API
• Get API
• Delete API
• Update API
• Multi-document APIs
• Multi Get API
• Bulk API
• Bulk UDP API
• Delete By Query API
19. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch | Developer-Friendly, RESTful API
Index
Type
ID
Document
20. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch | Search & Analyze Data in Real Time
Apache 2 Open Source License Build on top of Apache Lucene™
21. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Installation - Package
1. curl -L -O https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/
tar/elasticsearch/2.0.0/elasticsearch-2.0.0.tar.gz
2. tar -xvf elasticsearch-2.0.0.tar.gz
3. cd elasticsearch-2.0.0/bin
4. ./elasticsearch
22. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Installation - Repositories
echo "deb http://packages.elastic.co/elasticsearch/2.x/debian stable main" | sudo tee -a /etc/
apt/sources.list.d/elasticsearch-2.x.list
Download and install the Public Signing Key:
Repository definition APT -> /etc/apt/sources.list.d/elasticsearch-2.x.list
sudo apt-get update && sudo apt-get install elasticsearch
wget -qO - https://packages.elastic.co/GPG-KEY-elasticsearch | sudo apt-key add -
Install Elasticsearch 2.0:
23. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Installation - That’s it!
Simply run:
curl 'http://localhost:9200/?pretty'
24. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Configuration - System
curl localhost:9200/_nodes/process?pretty
• #File descriptors
• Setting it to 32k or even 64k is
recommended
• #Memory settings
• Disable swap
• sudo swapoff -a
• /etc/fstab
25. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Configuration - System
-> /etc/default/elasticsearch• ES_HEAP_SIZE
• Leave enough for the OS
• Leave enough for the
• Neve ever go over 30.1 GB!!
• I’ll go with half < 30 GB
26. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Configuration - Elasticsearch
curl localhost:9200/_nodes/process?pretty• #/etc/elasticsearch/elasticsearch.yml
• network:
• host : <MACHINE IP ADDRESS>
• path:
• logs: /var/log/elasticsearch
• data: /var/data/elasticsearch
• cluster:
• name: <NAME OF YOUR CLUSTER>
• node:
• name: <NAME OF YOUR NODE>
27. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Configuration - Elasticsearch
Node.name
Cluster.name
mlockall
# Elasticsearch performs poorly when JVM starts swapping: you should ensure that it _never_ swaps.
29. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Shards and Replicas
Why Shards and Replicas?
• ES has built in clustering
• Scaling out index: (shards)
• Parallel work on an index: (shards)
• Increasing availability: (replicas)
• Can change number of replicas anytime!
• Cannot change number of shards after index creation! (must reindex)
30. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Shards and Replicas
What is Shard?
• You can't actually split an index!
• ES uses Multiple Lucene indexes (AKA SHARDS)
• Simply, a shard is a Lucene index!
• Over head, query hits all shards for scoring
• So they don’t come free!
31. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | How many Shards and Replicas?
• Replicas:
• More replicas = More availability = Longer indexing!
• Shards
• How much data?
• How many queries?
• How complex are those queries?
• How much resources each node has?
• Number of nodes in your cluster
• Don’t know? over allocate few shards. (but not too many, they are not free!)
32. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | How many Shards and Replicas?
• Changing number of replicas: easy
curl -XPUT 'localhost:9200/my_index/_settings' -d '
{
"index" : {
"number_of_replicas" : 4
}
}'
• Changing number of shards: must be re-indexed
• For some, not a big deal.
• For some, it is a big deal!
33. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | How many Shards and Replicas?
• StackOverflow http://stackexchange.com/performance
• Scaling out:
• More shards than the #nodes
• Multiple shards in one node
34. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | How many Shards and Replicas?
• 3x nodes - 3x shards - 2x replicas
• Failing 2x nodes = cluster’s still healthy
• Doubling the storage need
• each replica = 1/3 of index size
• Storage is cheap, small price to pay for availability
35. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Let’s Use It!
• Wikipedia uses Elasticsearch to provide full-text search with highlighted search
snippets, and search-as-you-type and did-you-mean suggestions.
• The Guardian uses Elasticsearch to combine visitor logs with social -network data to
provide real-time feedback to its editors about the public’s response to new articles.
• Stack Overflow combines full-text search with geolocation queries and uses more-
like-this to find related questions and answers.
• GitHub uses Elasticsearch to query 130 billion lines of code!
full-text search, structured search, analytics, and all three in combination:
36. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | RESTful API with JSON over HTTP
• VERB: GET, POST, PUT, HEAD, or DELETE.
• PROTOCOL: http or https
• HOST: hostname of any node
• PORT: Elasticsearch HTTP service, which defaults to 9200
• PATH: API Endpoint (_count, _cluster/stats, _nodes/stats/jvm, etc.)
• QUERY_STRING: any optional query-string parameters for example ?pretty
• BODY: A JSON-encoded request body (if the request needs one.)
curl -X<VERB> '<PROTOCOL>://<HOST>:<PORT>/<PATH>?<QUERY_STRING>' -d '<BODY>'
37. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | RESTful API with JSON over HTTP
curl -XGET 'http://localhost:9200/_count?pretty' -d '
{
"query": {
"match_all": {}
}
}
'
{
"count" : 0,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
}
}
38. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Clarification
Relational DB Databases Tables Rows Columns
Elasticsearch Indices Types Documents Fields
• Index (noun)
• Traditional relational database. It is the place to store related documents.
• Index (verb)
• To index a document is to store a document in an index (noun) so that it can be retrieved
and queried. (Like INSERT in SQL)
• Inverted index
• B-tree index in Relational databases add an index = Elasticsearch and Lucene use a
structure called an inverted index. Both to improve the speed of data retrieval.
39. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Indexing a document
PUT /cnrs/employee/1
{
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
• cnrs
• The index name
• employee
• The type name
• /1
• The ID of this particular employee
PUT /cnrs/employee/2
{
"first_name" : "Jane",
"last_name" : "Smith",
"age" : 32,
"about" : "I like to collect rock albums",
"interests": [ "music" ]
}
40. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Retrieving a document
GET /cnrs/employee/1
{
"_index" : "cnrs",
"_type" : "employee",
"_id" : "1",
"_version" : 1,
"found" : true,
"_source" : {
"first_name" : "John",
"last_name" : "Smith",
"age" : 25,
"about" : "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
}
41. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Deleting a document
DELETE /cnrs/employee/1
42. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Search
GET /cnrs/employee/_search
{
"took": 6,
"timed_out": false,
"_shards": { ... },
"hits": {
"total": 2,
"max_score": 1,
"hits": [
{
"_index": "cnrs",
"_type": "employee",
"_id": "1",
"_score": 1,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
},
{
"_index": "cnrs",
"_type": "employee",
"_id": "2",
"_score": 1,
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [ "music" ]
}
}
]
}
}
GET /cnrs/employee/_search?q=last_name:Smith
{
...
"hits": {
"total": 2,
"max_score": 0.30685282,
"hits": [
{
...
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
},
{
...
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [ "music" ]
}
}
]
}
}
43. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Search with Query DSL
GET /cnrs/employee/_search
{
"query" : {
"match" : {
"last_name" : "Smith"
}
}
}
Elasticsearch provides a rich, flexible, query language called the query DSL, which allows us to build much more complicated,
robust queries.
44. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Search with Query DSL
GET /cnrs/employee/_search
{
"query" : {
"filtered" : {
"filter" : {
"range" : {
"age" : { "gt" : 30 }
}
},
"query" : {
"match" : {
"last_name" : "smith"
}
}
}
}
}
{
...
"hits": {
"total": 1,
"max_score": 0.30685282,
"hits": [
{
...
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [ "music" ]
}
}
]
}
}
45. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Full-text Search
GET /cnrs/employee/_search
{
"query" : {
"match" : {
"about" : "rock climbing"
}
}
}
{
...
"hits": {
"total": 2,
"max_score": 0.16273327,
"hits": [
{
...
"_score": 0.16273327,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
},
{
...
"_score": 0.016878016,
"_source": {
"first_name": "Jane",
"last_name": "Smith",
"age": 32,
"about": "I like to collect rock albums",
"interests": [ "music" ]
}
}
]
}
}
46. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Phrase Search
GET /cnrs/employee/_search
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
}
}
{
...
"hits": {
"total": 1,
"max_score": 0.23013961,
"hits": [
{
...
"_score": 0.23013961,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [ "sports", "music" ]
}
}
]
}
}
47. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Highlighting Searches
GET /cnrs/employee/_search
{
"query" : {
"match_phrase" : {
"about" : "rock climbing"
}
},
"highlight": {
"fields" : {
"about" : {}
}
}
}
{
...
"hits": {
"total": 1,
"max_score": 0.23013961,
"hits": [
{
...
"_score": 0.23013961,
"_source": {
"first_name": "John",
"last_name": "Smith",
"age": 25,
"about": "I love to go rock climbing",
"interests": [ "sports", "music" ]
},
"highlight": {
"about": [
"I love to go <em>rock</em> <em>climbing</
em>"
]
}
}
]
}
}
The highlighted fragment from the original text
48. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Search with Query DSL
• Full text queries
• Match Query
• Multi Match Query
• Common Terms Query
• Query String Query
• Simple Query String
Query
• Term level queries
• Term Query
• Terms Query
• Range Query
• Exists Query
• Missing Query
• Prefix Query
• Wildcard Query
• Regexp Query
• Fuzzy Query
• Type Query
• Ids Query
• Compound queries
• Constant Score Query
• Bool Query
• Dis Max Query
• Function Score Query
• Boosting Query
• Indices Query
• And Query
• Not Query
• Or Query
• Filtered Query
• Limit Query
• Joining queries
• Nested Query
• Has Child Query
• Has Parent Query
• Geo queries
• GeoShape Query
• Geo Bounding Box
Query
• Geo Distance
Query
• Geo Distance
Range Query
• Geo Polygon
Query
• Geohash Cell
Query
• Specialized
queries
• More Like This
Query
• Template Query
• Script Query
49. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Search with Query DSL
54. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Aggregations
• Metrics Aggregations
• Avg Aggregation
• Cardinality Aggregation
• Extended Stats Aggregation
• Geo Bounds Aggregation
• Max Aggregation
• Min Aggregation
• Percentiles Aggregation
• Percentile Ranks Aggregation
• Scripted Metric Aggregation
• Stats Aggregation
• Sum Aggregation
• Top hits Aggregation
• Value Count Aggregation
• Bucket Aggregations
• Children Aggregation
• Date Histogram Aggregation
• Date Range Aggregation
• Filter Aggregation
• Geo Distance Aggregation
• GeoHash grid Aggregation
• Histogram Aggregation
• IPv4 Range Aggregation
• Missing Aggregation
• Nested Aggregation
• Range Aggregation
• Reverse nested Aggregation
• Sampler Aggregation
• Significant Terms Aggregation
• Terms Aggregation
55. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Plugins
• Plugins Types
• Java plugins
• JAR files
• Must be installed on all nodes in the cluster
• Each node must be restarted
• Site plugins
• Web content: JS, HTML, CSS etc.
• Can be only on one node
• Do not require a restart
• Mixed plugins
• Both JAR files and web content
to enhance the core Elasticsearch functionality
56. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Plugins
• API extension Plugins
• Alerting Plugins
• Analysis Plugins
• Discovery Plugins
• Management and Site Plugins
• Mapper Plugins
• Scripting Plugins
• Security Plugins
• Snapshot/Restore Plugins
• Transport Plugins
• Integrations
to enhance the core Elasticsearch functionality
57. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Plugins - kopf
Web administration tool for Elasticsearch cluster
sudo [/usr/share/elasticsearch/]bin/plugin install [plugin_name]
sudo [/usr/share/elasticsearch/]bin/plugin install lmenezes/elasticsearch-kopf
sudo [/usr/share/elasticsearch/]bin/plugin install lmenezes/elasticsearch-kopf/2.x
open http://localhost:9200/_plugin/kopf
58. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Plugins - kopf
Web administration tool for Elasticsearch cluster
59. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Plugins - kopf
Web administration tool for Elasticsearch cluster
60. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Plugins - kopf
Web administration tool for Elasticsearch cluster
61. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Plugins - kopf
Web administration tool for Elasticsearch cluster
62. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Plugins - kopf
Web administration tool for Elasticsearch cluster
63. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Plugins - kopf
Web administration tool for Elasticsearch cluster
index 5k/s larger document (full tweets)
index 23k/s smaller document (tweet date, text, etc.)
index 67k/s just 140 characters and ID
64. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Plugins - kopf
Web administration tool for Elasticsearch cluster
65. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Plugins - kopf
Web administration tool for Elasticsearch cluster
sudo service elasticsearch stop
66. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Plugins - kopf
Web administration tool for Elasticsearch cluster
sudo service elasticsearch start
67. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Plugins - kopf
Web administration tool for Elasticsearch cluster
sudo service elasticsearch start
68. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Plugins - kopf
Web administration tool for Elasticsearch cluster
69. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Full-text search and Data Analytics
• Web of Science
• Text Mining and NLP
• Keyword Extractions
• Phrase Occurrence
• Phrase Co-Occurrence
• Keyword Analytics
• Date Histogram
• Significant Terms
• N-Grams
• etc.
70. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Full-text search and Data Analytics
71. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Full-text search and Data Analytics
72. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Full-text search and Data Analytics
73. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Full-text search and Data Analytics
74. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Full-text search and Data Analytics
75. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Full-text search and Data Analytics
• Boolean
• Query String
• Aggregation
• Date histogram
• Query cache
• nop!
76. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Full-text search and Data Analytics
{
"bool": {
"must": { "match": { "title": "how to make millions" }},
"must_not": { "match": { "tag": "spam" }},
"should": [
{ "match": { "tag": "starred" }},
{ "range": { "date": { "gte": "2014-01-01" }}}
]
}
}
• title field matches “how to make
millions”
• not marked as spam
• documents are starred or are from
2014 onward will rank higher
• Documents that match both
conditions will rank even higher
77. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Full-text search and Data Analytics
• Filters
• for binary yes/no searches
• for queries on exact values
• Exists
• just the ones with abstract != null
• Query cache
• true!
78. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Full-text search and Data Analytics
My bool vs. filter = ~500ms vs. ~50ms - ~120ms
79. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Full-text search and Data Analytics
80. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Full-text search and Data Analytics
• Query DSL: Filters
• And Filter
• Bool Filter
• Exists Filter
• Geo Bounding Box Filter
• Geo Distance Filter
• Geo Distance Range Filter
• Geo Polygon Filter
• GeoShape Filter
• Geohash Cell Filter
• Has Child Filter
• Has Parent Filter
• Ids Filter
• Indices Filter
• Limit Filter
• Match All Filter
• Missing Filter
• Nested Filter
• Not Filter
• Or Filter
• Prefix Filter
• Query Filter
• Range Filter
• Regexp Filter
• Script Filter
• Term Filter
• Terms Filter
• Type Filter
81. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Full-text search and Data Analytics
82. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Full-text search and Data Analytics
83. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Full-text search and Data Analytics
• Significant Terms Aggregation
• JLH score
• mutual information
• Chi square
• google normalized distance
• Percentage
• scripted
[Yang and Pedersen, "A Comparative Study on Feature Selection in Text Categorization", 1997]
(http://courses.ischool.berkeley.edu/i256/f06/papers/yang97comparative.pdf) for a study on using
significant terms for feature selection for text classification).
"script_heuristic": {
"script": "_subset_freq/(_superset_freq - _subset_freq + 1)"
}
84. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Full-text search and Data Analytics
85. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
The Elastic Platform | Make Sense of Your Data
86. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
The Elastic Platform | Make Sense of Your Data
87. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
The Elastic Platform | Make Sense of Your Data
88. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Logs!
• Check Nginx access logs between 2015-11-23T10:23:10 and 2015-11-24T21:53:30
• Check Suricata alert >= 2 between 2015-11-23T10:23:10 and 2015-11-24T21:53:30 and with type
of DNS
• Now correlate the results!!!
89. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Logs!
90. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Logs!
• Old School tools
• grep/sed/awk/cut/sort
• Manually analyze the output
• Different formats
• Customized fields and details
• Not centralized
• Modern way (the right way!)
• Define endpoints (input/output)
• Correlate patterns
• Store data (searchable and visualizable)
91. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Logs!
• Symantec Security Information Manager
• Splunk
• HP / Arcsight
• Tripwire
• NetIQ
• Quest Software
• IMB/Q1 Labs
• Novell
• graylog
• fluentd
92. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Logstash | Collect, Enrich & Transport Data
• Process Any Data, From Any Source
• Centralize data processing of all types
• Normalize varying schema and formats
• Quickly extend to custom log formats
• Easily add plugins for custom data sources
• https://github.com/elastic/logstash
93. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Logstash | Collect, Enrich & Transport Data
input { stdin { } }
filter {
grok {
match => { "message" => "%{COMBINEDAPACHELOG}" }
}
date {
match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
}
}
output {
elasticsearch { hosts => ["localhost:9200"] }
stdout { codec => rubydebug }
}
94. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Logstash | Collect, Enrich & Transport Data
127.0.0.1 - - [11/Dec/2013:00:01:45 -0800] "GET /xampp/status.php HTTP/1.1" 200 3891 "http://cadenza/
xampp/navi.php" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0
{
"message" => "127.0.0.1 - - [11/Dec/2013:00:01:45 -0800] "GET /xampp/status.php HTTP/1.1" 200 3891 "http://cadenza/
xampp/navi.php" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0"",
"@timestamp" => "2013-12-11T08:01:45.000Z",
"@version" => "1",
"host" => "cadenza",
"clientip" => "127.0.0.1",
"ident" => "-",
"auth" => "-",
"timestamp" => "11/Dec/2013:00:01:45 -0800",
"verb" => "GET",
"request" => "/xampp/status.php",
"httpversion" => "1.1",
"response" => "200",
"bytes" => "3891",
"referrer" => ""http://cadenza/xampp/navi.php"",
"agent" => ""Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:25.0) Gecko/20100101 Firefox/25.0""
}
98. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Logstash Server: syslog, auth, ufw and nginx
input {
lumberjack {
port => 5000
type => "logs"
ssl_certificate => "/etc/pki/tls/certs/logstash-forwarder.crt"
ssl_key => "/etc/pki/tls/private/logstash-forwarder.key"
}
}
99. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Logstash Server: syslog, auth, ufw and nginx
filter {
if [type] == "syslog" {
grok {
match => { "message" => "%{SYSLOGTIMESTAMP:syslog_timestamp} %
{SYSLOGHOST:syslog_hostname} %{DATA:syslog_program}(?:[%{POSINT:syslog_pid}])?: %
{GREEDYDATA:syslog_message}" }
add_field => [ "received_at", "%{@timestamp}" ]
add_field => [ "received_from", "%{host}" ]
}
syslog_pri { }
date {
match => [ "syslog_timestamp", "MMM d HH:mm:ss", "MMM dd HH:mm:ss" ]
}
}
}
100. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Logstash-forwarder: syslog, auth, ufw and nginx
output {elasticsearch {
host => "10.0.0.25"
port => "9300"
cluster => "iscpif-es"}
}
101. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Logstash: Suricata | Open Source IDS / IPS / NSM engine
• Highly Scalable
• Suricata is multi threaded
• Protocol Identification
• Suricata a Malware Command and Control Channel
hunter.
• Off port HTTP CnC channels, which normally slide
right by most IDS systems
• Thanks to dedicated keywords you can match on
protocol fields which range from http URI to a SSL
certificate identifier.
• File Identification, MD5 Checksums, and File
Extraction
• Identify thousands of file types while crossing your
network!
102. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Suricata.yml
- eve-log:
enabled: yes
type: file #file|syslog|unix_dgram|unix_stream
filename: eve.json
types:
- alert
- http:
extended: yes
- dns
- tls:
extended: yes # enable this for extended logging information
- files:
force-magic: yes # force logging magic on all logged files
force-md5: yes # force logging of md5 checksums
- ssh
105. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Logstash: Suricata
106. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Logstash: Suricata
• Logstash daily index
• index template
• easy to retire index
• close/delete
• 22 machines
• only 2 with public IP
• Logs
• between 1-3 millions /day
107. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Logstash: Suricata
• 73 million docs < 2days!
• during Mongodump
• transferring remotely
• Watch out for Suricata
• Stream events!
• SURICATA STREAM Packet with
invalid ack
• And lots of other stream alerts!
• I disabled it! Maybe I am wrong :)
108. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Kibana | Explore & Visualize Your Data
• Seamless Integration with Elasticsearch
• Give Shape to Your Data
• Sophisticated Analytics
• Empower More Team Members
• Flexible Interface, Easy to Share
• Easy Setup
• Visualize Data from Many Sources
• Simple Data Export
109. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
Kibana 4.2.1 | Compatible with Elasticsearch 2.x
110. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
The ELK Stack | Elasticsearch, Logstash and Kibana
Elasticsearch 1.7.3
(2.0.0)
Logstash 1.5.5
(2.0.0)
Kibana 3
(4.2.1)
111. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
The ELK Stack | Elasticsearch, Logstash and Kibana
112. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
The ELK Stack | Elasticsearch, Logstash and Kibana
113. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
The ELK Stack | Elasticsearch, Logstash and Kibana
114. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
The ELK Stack | Elasticsearch, Logstash and Kibana
115. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
The ELK Stack | Elasticsearch, Logstash and Kibana
116. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
The ELK Stack | Elasticsearch, Logstash and Kibana
117. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
The ELK Stack | Elasticsearch, Logstash and Kibana
118. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
The ELK Stack | Elasticsearch, Logstash and Kibana
119. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | ISCPIF Use Cases
The ELK Stack | Elasticsearch, Logstash and Kibana
120. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Elasticsearch Mapping
• Which string fields should be full text fields.
• Which fields contain numbers, dates, or geolocations.
• The format of date values.
• a simple type like string, date, long, double, boolean or ip.
• a type which supports the hierarchical nature of JSON such as object or nested.
• or a specialized type like geo_point, geo_shape, or completion.
• multi-fields
• a string field could be indexed as an analyzed field for full-text search, and as a not_analyzed
field for sorting or aggregations.
• Alternatively, you could index a string field with the standard analyzer, the english analyzer, and
the french analyzer.
121. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Elasticsearch Mapping
• Dynamic mapping
• Fields and mapping types do not need to be defined before being used.
• Explicit mappings
• You can create mapping types and field mappings when you create an index
• Updating existing mappings
• Existing type and field mappings cannot be updated
• Create a new index with the correct mappings and reindex your data
125. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Production
• Hardware
• Memory
• 64 GB of RAM is the ideal sweet spot
• 16 GB of RAM for Heap Size and 32 GB total
• And don’t cross 30.5 GB!
• CPU
• 2-8 cores of CPU
• faster CPUs vs. more cores = choose more cores
• Disk
• SSDs (monitor the I/O)
• high-performance server disks (15k RPM)
• RAID 0 (ES is high available by replicas)
126. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Security
• No built-in authentication
• Do not expose Elasticsearch to the world
• Watch out for Denial of Service
• Do not give users to define index name (like ",*" )
• Turn off Dynamic Scripts (default is off)
• Control protocols (DELETE, PUT, etc.)
• Nginx (reverse proxy, SSL and auth)
• Apache
127. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Elasticsearch 2.0 | Lots of things!
• Analyzer and tokenizer (human language)
• Log slow ops
• Index settings
• refresh interval
• flush interval
• Differentiate your nodes
• Data nodes
• Master nodes
• Client nodes
• Cluster health
• Heap size
• Thread pools
• Merging time
• etc.
129. • Document Database
• Documents (i.e. objects)
• Embedded documents and arrays (no expensive joins!)
• Dynamic schema supports fluent polymorphism.
• High Availability
• automatic failover
• data redundancy
• Automatic Scaling
• Automatic sharding
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | Key Features
130. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | Platforms
131. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | Install MongoDB on Ubuntu
echo "deb http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.0 multiverse" | sudo
tee /etc/apt/sources.list.d/mongodb-org-3.0.list
Import the public key used by the package management system
Create a list file for MongoDB
sudo apt-get update && sudo apt-get install -y mongodb-org
sudo service mongod start
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv 7F0CEB10
Install MongoDB 3.0.7
132. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | SQL to MongoDB
133. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | SQL to MongoDB
https://docs.mongodb.org/manual/reference/sql-comparison/
134. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | Insert Documents
139. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | MapReduce
140. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | Index
• Creating an index
• db.ships.ensureIndex({name : 1})
• Dropping an index
• db.ships.dropIndex({name : 1})
• Creating a compound index
• db.ships.ensureIndex({name : 1, operator : 1, class : 0})
• Dropping a compound index
• db.ships.dropIndex({name : 1, operator : 1, class : 0})
141. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | Replica Set
142. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | Replica Set
• Replica Set Members
• Replica Set Primary
• Accepts write operations
• Replica Set Secondary Members
• Replicate the primary’s data set and accept read operations
• Priority 0 Replica Set Members
• Priority 0 members are secondaries that cannot become the primary.
• Hidden Replica Set Members
• Invisible to applications
• Replica Set Arbiter
143. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | Shardings
145. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | 3.0
• 7-10x Better Performance
• Up to 80% Less Storage
• Reduce Operational Overhead By Up to 95%
• Pluggable Storage Optimized For Your Workload
• Low Latency Across the Globe
• Enhancements That Make You More Productive
• Faster Loading and Export
• Easier Query Optimization
• Faster Debugging
• Richer Geospatial Apps
• Better Time-Series Analytics
147. Datacenter in France and Italy
60M/week - 8M/day - 360K/hour
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | ISCPIF Use Cases
148. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | ISCPIF Use Cases
Avg. usage of Twitter in Paris in October
149. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | ISCPIF Use Cases
24HOURS NEWS: Real-time Breaking News
150. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | ISCPIF Use Cases
24HOURS NEWS: Real-time Breaking News
• 50-100 Updates /s
• Time-series Queries
• Grid-FS
• FTS (full-text search)
• Tokenizes and stems
• Scoring
• 140 characters/small dataset!
• TTL Index
• Session store
151. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | ISCPIF Use Cases
TIMOTHY: Real-time Dashboards
152. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | ISCPIF Use Cases
TIMOTHY: Real-time Dashboards
153. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | ISCPIF Use Cases
TIMOTHY: Real-time Dashboards
154. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | ISCPIF Use Cases
HIGH THROUGHPUT: Real-time aggregations, over 50K inserts /s
155. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | ISCPIF Use Cases
Most generic
Most specific Calculating the new graph
HIGH THROUGHPUT: Real-time aggregations, over 50K inserts /s
156. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | ISCPIF Use Cases
130K
inserts /s
157. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | ISCPIF Use Cases
10K-80K
inserts /s
158. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | ISCPIF Use Cases
9K queries /s
159. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | ISCPIF Use Cases
160. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | ISCPIF Use Cases
Gephi Streaming
161. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | Monitoring System
162. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | Monitoring System Network and Cache
163. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | Monitoring System Hardware
164. MongoDB | Monitoring System Hardware
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
165. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | ISCPIF Use Cases
Monitoring NewRelic
166. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | ISCPIF Use Cases
Monitoring NewRelic
167. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | Monitoring System Objects
3.02 Billion Documents
64 Collections
195 Indexes
168. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
MongoDB | Monitoring System Objects
169. “in-memory data structure store”
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
used as database, cache and message broker
170. • Data structures
• strings
• hashes
• lists
• sets
• sorted sets
• bitmaps
• hyperlogs
• geospatial indexes
• Built-in
• replication
• Lua scripting
• transactions
• on-disk persistence
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Redis | Key Features
171. Redis KEYS:
> set mykey somevalue
OK
> get mykey
“somevalue”
Redis LISTS:
> rpush mylist A
(integer) 1
> rpush mylist B
(integer) 2
> lpush mylist first
(integer) 3
> lrange mylist 0 -1
1) "first"
2) "A"
3) "B"
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Redis | Data Structures just a little!
172. maziyar-beautiful-MacBook$ redis-benchmark -q -n 100000
PING_INLINE: 85178.88 requests per second
PING_BULK: 80000.00 requests per second
SET: 86580.09 requests per second
GET: 83263.95 requests per second
INCR: 83963.05 requests per second
LPUSH: 86880.97 requests per second
LPOP: 90252.70 requests per second
SADD: 84388.19 requests per second
SPOP: 92936.80 requests per second
LPUSH (needed to benchmark LRANGE): 87336.24 requests per second
LRANGE_100 (first 100 elements): 25614.75 requests per second
LRANGE_300 (first 300 elements): 10455.88 requests per second
LRANGE_500 (first 450 elements): 7125.04 requests per second
LRANGE_600 (first 600 elements): 5369.13 requests per second
MSET (10 keys): 50000.00 requests per second
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Redis | How fast is Redis?
173. maziyar-beautiful-MacBook$ redis-benchmark -n 1000000 -t set,get -P 16 -q
PING_INLINE: 735294.12 requests per second
PING_BULK: 988142.31 requests per second
SET: 681198.88 requests per second
GET: 831255.25 requests per second
INCR: 778210.12 requests per second
LPUSH: 682593.81 requests per second
LPOP: 713775.88 requests per second
SADD: 732600.75 requests per second
SPOP: 885739.62 requests per second
LPUSH (needed to benchmark LRANGE): 656598.81 requests per second
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Redis | How fast is Redis? Pipelining of 16 commands
174. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Redis | ISCPIF Use Cases
Scientific Games
175. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Redis | ISCPIF Use Cases
Scientific Games
176. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Redis | ISCPIF Use Cases
Scientific Games
177. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Redis | ISCPIF Use Cases
• Scientific Operations
• Occurrence
• Co-Occurrence
• Scientific Games
• Pub/Sub
• Rate Limiter (IP-based with TTL)
• Chat rooms
• TTL
178. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Redis | ISCPIF Use Cases
Monitoring NewRelic
179. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Redis | ISCPIF Use Cases
Monitoring NewRelic
180. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Redis | ISCPIF Use Cases
Monitoring NewRelic
181. “Messaging that just works”
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
well actually it’s more than that, but OK!
182. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
RabbitMQ | Feature List
A messaging broker
• Highlights
• Reliability
• Flexible Routing
• Clustering
• Federation
• Highly Available Queues
• Multi-protocol
• Many Clients
• Management UI
• Plugin System
• For what?
• Data delivery
• Non-blocking operations
• Push notifications
• Publish / subscribe
• Asynchronous processing (work queues)
183. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
RabbitMQ | Feature List
A messaging broker
Type
Topic
Q1
Q2
Q3
climate.*
risk.*
news.*
RabbitMQ Routing
184. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
RabbitMQ | ISCPIF Use Cases
Monitoring NewRelic
185. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
RabbitMQ | ISCPIF Use Cases
Monitoring RabbitMQ Management UI
186. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
RabbitMQ | ISCPIF Use Cases
Monitoring RabbitMQ Management UI
187. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
RabbitMQ | ISCPIF Use Cases
Monitoring RabbitMQ Management UI
188. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
RabbitMQ | ISCPIF Use Cases
Monitoring RabbitMQ Management UI
189. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
RabbitMQ | ISCPIF Use Cases
• Distributed Computations
• Parsing text files
• Scientific calculations
• Realtime Processing
• Text mining
• NLP
• Annotation
• Keyword extractions
• Job Queues
• RPC (Remote procedure call)
• Topic based routing
190. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
RabbitMQ | ISCPIF Use Cases
• Parsing
• 225 file
• 10m-20m lines
• Avg. total of 3.3 Billions
• RPC
• Post-process each document
• Output
• MongoDB
• ElasticSearch
• Redis
191. INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
ISCPIF Big Data
• Multivac (Open Data Platform)
• ISCPIF APIs
• Science en Poche
• Climatique (COP21)
• Risk (AXA research fund)
• Scientific Dashboards
• Distributed Computing
• Nobel Game (scientific game)
• Twitter streaming (UN, France, Climate Change, Risk, etc)
• Instagram streaming (Paris)
221. 22 VMs
Distributed Systems
320K Ops /S
130K Insert /S
64K Index /S
…
8 WEB SERVERS
4 API SERVERS
Search Engine Cluster
900 million data
45%
Database
2.9 billion data
22%
120 Cores
2.5TB RAM
30 TB SSD
+8000 Lines Code
14 Web Apps
4 Mobile Apps
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
222. • Starting 2016 with CouchBase in parallel
• Graph Databases
• Spark Streaming / machine learning
• Clustering and categorizing in real-time
• Creative Hardware
• SlipStream, StratusLab and EGI Cloud
• Healthcare and Wearable devices
• Non, no drones! ;-)
What’s next for Big Data at ISCPIF
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
223. –The Blacklist: Lord Baltimore (No. 104)
“Every piece of information is worth something to somebody. And
in the hands of the wrong person, that could be deadly.”
INSTITUT DES SYSTÈMES COMPLEXES DE PARIS ÎLE-DE-FRANCE (ISCPIF)
Reddington: People love to decry big brother the NSA, the government listening in on
their most private lives, yet they all willingly go online and hand over the most intimate
details of those lives - to big data.
Elizabeth: Most people don't care that Google knows their search history.
Reddington: They know more than that. They know your habits, the banks you use, the
pills you pop, the men or women you sleep with.”