Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Fazendo mágica com ElasticSearch

5,031 views

Published on

Quando uma aplicação começa a ficar grande e complexa, fazer buscas nos seus models torna-se uma tarefa complicada. Efetuar as buscas diretamente no banco de dados é um processo lento, ineficiente e que permite pouca ou nenhuma maleabilidade sobre a forma com que a busca é feita. Surge então o ElasticSearch, uma engine de busca utilizada por empresas como Github, Twitter e 4square para indexar e buscar literalmente milhões de documentos em tempo real. Nessa palestra, explicarei quando, como e porque utilizar o ElasticSearch para facilmente indexar e efetuar buscas complexas nos seus models.

Published in: Technology

Fazendo mágica com ElasticSearch

  1. 1. Fazendo mágica com ElasticSearch PEDROFRANCESCHI @pedroh96 pedro@pagar.me github.com/pedrofranceschi
  2. 2. Outubro/2010
  3. 3. Filters Full text search Sort Highlight Facets Pagination
  4. 4. Você vai precisar buscar dados.
  5. 5. Você vai precisar entender dados.
  6. 6. (My)SQL não é a solução. (… nem NoSQL)
  7. 7. O que é o ElasticSearch?
  8. 8. ElasticSearch • “Open Source Distributed Real Time Search & Analytics” • API RESTful para indexar/buscar JSONs (“NoSQL”) • NÃO é um banco de dados • Apache Lucene • Just works (and scales) • Full text search, aggregations, scripting, etc, etc, etc.
  9. 9. Nomes? MySQL ElasticSearch Database Index Table Type Row Document Column Field Schema Mapping Partition Shard
  10. 10. Como usar o ElasticSearch?
  11. 11. $ curl -XPUT 'http://localhost:9200/twitter/tweet/1' -d '{! "user" : “pedroh96",! "post_date" : "2009-11-15T14:12:12",! "message" : "trying out Elasticsearch"! }' Endpoint Index Type Document ID Document {! "_index" : "twitter",! "_type" : "tweet",! "_id" : "1",! "_version" : 1,! "created" : true! } PUT data
  12. 12. Endpoint Index Type $ curl -XGET 'http://localhost:9200/twitter/tweet/1' Document ID {! "_id": "1",! "_index": "twitter",! "_source": {! "message": "trying out Elasticsearch",! "post_date": "2009-11-15T14:12:12",! "user": "pedroh96"! },! "_type": "tweet",! "_version": 1,! "found": true! } Document GET data
  13. 13. GET data Endpoint Index $ curl -XGET 'http://localhost:9200/twitter/_search'! -d ‘{ query: . . . }! ! ! Query de busca ! ! ! ! ! ! ! Operador de busca
  14. 14. ActiveRecords class Tweet < ActiveRecord::Base! end
  15. 15. ActiveRecords require 'elasticsearch/model'! ! class Tweet < ActiveRecord::Base! include Elasticsearch::Model! include Elasticsearch::Model::Callbacks! end! !
  16. 16. Tweet.import
  17. 17. Tweet.search(“pedroh96”)
  18. 18. Por que usar o ElasticSearch?
  19. 19. DISCLAIMER
  20. 20. Post.where(:all, :author => "pedroh96") vs Post.search(query: { match: { author: "pedroh96" }}) Just Another Query Language?
  21. 21. 1) Full text search
  22. 22. ActiveRecords $ rails g scaffold Post title:string! source:string
  23. 23. GET /posts/5 Post.find(5) :-) ActiveRecords
  24. 24. ActiveRecords “Amazon to Buy Video Site Twitch for More Than $1B” Post.where(:all, :title => "Amazon to Buy Video Site Twitch for More Than $1B") :-)
  25. 25. “amazon” Post.where(["title LIKE ?", "%Amazon%"]) ??? ActiveRecords
  26. 26. “amazon source:online.wsj.com” Post.where(["title LIKE ? AND source = ?", "%Amazon%", "online.wsj.com"]) ?????? ActiveRecords
  27. 27. “amazon” Post.search("amazon") :-) ElasticSearch
  28. 28. ElasticSearch “amazon source:online.wsj.com” search = Post.search("amazon source:online.wsj.com") :-)
  29. 29. ElasticSearch “amazon source:online.wsj.com” search = Post.search( query:{ match: { _all: "amazon source:online.wsj.com", } } ) Full-text search
  30. 30. ElasticSearch “amazon source:online.wsj.com” search = Post.search( query:{ multi_match: { query: "amazon source:online.wsj.com", fields: ['title^10', 'source'] } } ) Full-text search Title boost
  31. 31. ElasticSearch “amazon source:online.wsj.com” search = Post.search( query:{ multi_match: { query: "amazon source:online.wsj.com", fields: ['title^10', 'source'] } }, highlight: { fields: { title: {} } } ) Title highlight Full-text search Title boost
  32. 32. ElasticSearch Title highlight > search.results[0].highlight.title => ["Twitch officially acquired by <em>Amazon</em>"]
  33. 33. 2) Aggregations (faceting)
  34. 34. Geo distance aggregation
  35. 35. ActiveRecords $ rails g scaffold Coordinate latitude:decimal longitude:decimal
  36. 36. ActiveRecords class Coordinate < ActiveRecord::Base! end
  37. 37. ActiveRecords class Coordinate < ActiveRecord::Base! def distance_to(coordinate)! # From http://en.wikipedia.org/wiki/Haversine_formula! rad_per_deg = Math::PI/180 # PI / 180! rkm = 6371 # Earth radius in kilometers! rm = rkm * 1000 # Radius in meters! ! dlon_rad = (coordinate.longitude.to_f - self.longitude.to_f) * rad_per_deg # Delta, converted to rad! dlat_rad = (coordinate.latitude.to_f - self.latitude.to_f) * rad_per_deg! ! lat1_rad = coordinate.latitude.to_f * rad_per_deg! lat2_rad = self.latitude.to_f * rad_per_deg! lon1_rad = coordinate.longitude.to_f * rad_per_deg! lon2_rad = self.longitude.to_f * rad_per_deg! ! a = Math.sin(dlat_rad/2)**2 + Math.cos(lat1_rad) * Math.cos(lat2_rad) * Math.sin(dlon_rad/2)**2! c = 2 * Math::atan2(Math::sqrt(a), Math::sqrt(1-a))! ! rm * c # Delta in meters! end! end > c1 = Coordinate.new(:latitude => -23.5532636, :longitude => -46.6528908) > c2 = Coordinate.new(:latitude => -23.5538488, :longitude => -46.6530035) > c1.distance_to(c2) => 66.07749735875552
  38. 38. ActiveRecords origin = Coordinate.new(:latitude => -23.5532636, :longitude => -46.6528908) buckets = [! {! :to => 100,! :coordinates => []! },! {! :from => 100,! :to => 300,! :coordinates => []! },! {! :from => 300,! :coordinates => []! }! ]! Coordinate.all.each do |coordinate|! distance = origin.distance_to(coordinate)! ! buckets.each do |bucket|! if distance < bucket[:to] and distance > (bucket[:from] || 0)! bucket[:coordinates] << coordinate! end! end! end ??????
  39. 39. ElasticSearch query = {! aggregations: {! Nome da aggregation rings_around_rubyconf: {! geo_distance: {! Field com localização Coordenadas da origem field: "location",! origin: "-23.5532636, -46.6528908",! ranges: [! { to: 100 },! { from: 100, to: 300 },! { from: 300 }! ]! }! Tipo da aggregation }! }! } Buckets para agregar search = Coordinate.search(query) :-)
  40. 40. (Extended) stats aggregation
  41. 41. ActiveRecords $ rails g scaffold Grade subject:string grade:decimal
  42. 42. ElasticSearch query = {! aggregations: {! Nome da aggregation grades_stats: {! Tipo da aggregation extended_stats: {! field: "grade",! }! }! }! }! ! search = Grade.search(query) Nome do field
  43. 43. ElasticSearch > search.response.aggregations.grades_stats! ! => #<Hashie::Mash avg=8.03 count=3 max=10.0 min=4.6 std_deviation=2.43 sum=24.1 sum_of_squares=211.41 variance=5.93>>
  44. 44. (Extended) stats aggregation + Scripting
  45. 45. ElasticSearch query = {! aggregations: {! grades_stats: {! extended_stats: {! field: "grade",! }! }! }! }
  46. 46. ElasticSearch query = {! aggregations: {! Nome da aggregation grades_stats: {! extended_stats: {! field: "grade",! script: "_value < 7.0 ? _value * correction : _value",! params: {! correction: 1.2! }! }! }! }! }! ! search = Grade.search(query) Nome do field JavaScript para calcular novo grade Tipo da aggregation
  47. 47. ElasticSearch > search.response.aggregations.grades_stats! ! => #<Hashie::Mash avg=8.34 count=3 max=10.0 min=5.52 std_deviation=2.00 sum=25.02 sum_of_squares=220.72 variance=4.01>>
  48. 48. Term aggregation
  49. 49. ElasticSearch query = {! aggregations: {! subjects: {! terms: {! Nome da aggregation field: "subject"! }! }! }! }! ! search = Grade.search(query) Nome do field Tipo da aggregation
  50. 50. ElasticSearch > search.response.aggregations.subjects! ! => #<Hashie::Mash buckets=[! #<Hashie::Mash doc_count=2 key=“math">,! #<Hashie::Mash doc_count=1 key="grammar">, #<Hashie::Mash doc_count=1 key=“physics">! ]>
  51. 51. Combined aggregations (term + stats)
  52. 52. ElasticSearch query = {! aggregations: {! subjects: {! terms: {! field: "subject"! }! }! }! }! ! search = Grade.search(query)
  53. 53. ElasticSearch query = {! aggregations: {! subjects: {! terms: {! Nome da parent aggregation field: "subject"! },! aggregations: {! grade_stats: {! stats: {! Nome da child aggregation field: "grade"! }! }! }! }! }! }! ! search = Grade.search(query) Field para parent aggregation Field para child aggregation
  54. 54. ElasticSearch > search.response.aggregations.subjects! ! #<Hashie::Mash buckets=[! #<Hashie::Mash doc_count=2 grade_stats=#<Hashie::Mash avg=9.0 count=2 max=10.0 min=8.0 sum=18.0> key="math">, #<Hashie::Mash doc_count=1 grade_stats=#<Hashie::Mash avg=4.6 count=1 max=4.6 min=4.6 sum=4.6> key="grammar">, #<Hashie::Mash doc_count=1 grade_stats=#<Hashie::Mash avg=9.5 count=1 max=9.5 min=9.5 sum=9.5> key=“physics">! ]>
  55. 55. Top Hits More like this Histogram Scripted metrics Geo bounds Stemmer (sinônimos) IPv4 ranges . . .
  56. 56. 3) Scoring
  57. 57. ActiveRecords $ rails g scaffold Post title:string! source:string likes:integer
  58. 58. “amazon” ElasticSearch search = Post.search( query: { match: { _all: "amazon", } } ) Full-text search search.results.results[0]._score => 0.8174651
  59. 59. “amazon” ElasticSearch search = Post.search( query: { custom_score: { query:{ match: { _all: "amazon", } }, script: "_score * doc['likes'].value" } } ) Full-text search Likes influenciam no score search.results.results[0]._score => 31.8811388
  60. 60. GET http://localhost:9200/post/_search?explain "_explanation": {! "description": "weight(tweet:honeymoon in 0)! [PerFieldSimilarity], result of:",! "value": 0.076713204,! "details": [! {! "description": "fieldWeight in 0, product of:",! "value": 0.076713204,! "details": [! {! "description": "tf(freq=1.0), with freq of:",! "value": 1,! "details": [! {! "description": "termFreq=1.0",! "value": 1! }! ]! },! {! "description": "idf(docFreq=1, maxDocs=1)",! "value": 0.30685282! },! {! "description": "fieldNorm(doc=0)",! "value": 0.25,! }! ]! }! ]! } Score explicado
  61. 61. 4) Indexando responses
  62. 62. $ rails g scaffold Post title:string! source:string likes:integer
  63. 63. class PostsController < ApplicationController! ! # ...! ! def show! @post = Post.find(params[:id])! ! render json: @post! end! ! # ...! ! end SELECT * FROM Posts WHERE id = params[:id]
  64. 64. class PostsController < ApplicationController! ! # ...! ! def show! @post = Post.search(query: { match: { id: params[:id] }})! ! render json: @post! end! ! # ...! ! end GET http://localhost:9200/posts/posts/params[:id]
  65. 65. ActiveRecords require 'elasticsearch/model'! ! class Post < ActiveRecord::Base! include Elasticsearch::Model! include Elasticsearch::Model::Callbacks! ! belongs_to :author! ! def as_indexed_json(options={})! self.as_json(! include: { author: { only: [:name, :bio] },! })! end! end Inclui um parent no JSON indexado
  66. 66. Expondo o ElasticSearch
  67. 67. http://localhost:9200/pagarme/_search https://api.pagar.me/1/search
  68. 68. Infraestrutura do Pagar.me ElasticSearch ElasticSearch Router api.pagar.me Servidor da API (Node.js) MySQL (transações e dados relacionais) MySQL (transações e dados relacionais) MongoDB (dados de clientes e não relacionais) Ambiente de testes (sandbox dos clientes) Servidor da API (Node.js) Ambiente de produção
  69. 69. Expondo o ElasticSearch • Endpoint do ElasticSearch -> Endpoint acessado pelo cliente… • … mas cuidado: dados precisam ser delimitados a conta do cliente (claro) • Vantagem: acesso às mesmas features do ElasticSearch (aggregations, statistics, scores, etc) • Segurança: desabilitar scripts do ElasticSearch
  70. 70. GET /search • Um único endpoint para todos os GETs • Todos os dados indexados e prontos para serem usados (no joins) • Queries complexas construídas no front-side (Angular.js) • Desenvolvimento front-end não dependente do back-end
  71. 71. Overall…
  72. 72. 1)Há uma ferramenta para cada tarefa. 2)Um martelo é sempre a ferramenta certa. 3)Toda ferramenta também é um martelo.
  73. 73. MySQL != NoSQL != ElasticSearch
  74. 74. Obrigado! :) PEDROFRANCESCHI @pedroh96 pedro@pagar.me github.com/pedrofranceschi
  75. 75. Perguntas? PEDROFRANCESCHI @pedroh96 pedro@pagar.me github.com/pedrofranceschi
  76. 76. Fazendo mágica com ElasticSearch PEDROFRANCESCHI @pedroh96 pedro@pagar.me github.com/pedrofranceschi

×