25. Rebalancing API
• Rebalance API
– Scaling Strategy
• Auto Shard
• Redistribute
• Replace
• Scale Up
• Scale Down
• Remove Dead Nodes
– Allocation Strategy
• Least Used Node
• Unused Node
– Size Based Sharding
– Discovery Based
Redistribution
25
29. Performance
#Doc Re-indexing Open source
Solr split shard
BloomReach
Rebalance API
BloomReach
Rebalance API
with parallel split
~10K 2 - 3 min 35 - 40 secs 30 - 35 secs 15 - 20 secs
~100K 6 - 7 min 3 - 3.5 min 2.5 - 3 mins 40 - 55 secs
~1M 35 min 13 - 15 mins 10 - 12 mins 2 - 3 mins
~10M 1h 15 min 28 - 30 mins 21 - 24 mins 3 - 4 mins
~150M 7h~ Timeout ~ 1 hour 18 - 20 mins
29
c.f. http://engineering.bloomreach.com/solrcloud-rebalance-api/
• Reindexingなしなので速い
• インデックスの分割だけでなく、コアの設定も自動以降
30. Exploring Solr
• “The Evolution of Lucene & Solr Numerics from Strings to Points”,
Steve Rowe, Lucidworks
– Lucene/Solrでの数値の扱いを、内部データ構造の変遷という視点から振り
返り
– 最新のDimensional Pointのベンチ報告
30
45. Benchmark (1)
45
• McCandless benchmark & Adrien Grand re-run
– 36% faster at query time
– 71% faster at index time
– 66% less disk
– 85% less memory
、、、良すぎない?
46. Benchmark
• Fixed range query
• 25M NYC taxi data
• 3種類のLong
– Trie numerics, precision step 8
– Point fields
– Trie numerics, precision step 最大
46
47. Benchmark
Indexing time Index size
Points 31s 1.2GiB
Trie 53s 1.6GiB
Single-precision Trie 19s 0.7GiB
47
http://www.slideshare.net/lucidworks/the-evolution-of-lucene-solr-
numerics-from-strings-to-points-presented-by-steve-rowe-lucidworks
• 24 fields, 6 string, 1 text, 2 long fields, 1 int field, 14 double
fields.
49. References
49
• “The Evolution of Lucene & Solr Numerics from Strings to Points”,
Steve Rowe, Lucidworks, http://www.slideshare.net/lucidworks/the-
evolution-of-lucene-solr-numerics-from-strings-to-points-presented-
by-steve-rowe-lucidworks
• Fun with flexible indexing, Michael McCandless,
http://blog.mikemccandless.com/2010/10/fun-with-flexible-
indexing.html