2. Who am I ?
• Anshum Gupta, Apache Lucene/Solr committer,
Lucidworks Employee.
• Search and related stuff for 9+ years.
• Apache Lucene since 2006 and Solr since 2010
but consistent community involvement since
2012
• Organizations I am or have been a part of:
3. Solr - Commits and Contributors
via https://www.openhub.net/p/solr
6. Ease of Use
• Start scripts in Solr
• bin/solr start -e cloud!
• Schemaless - REST APIs to manage schema
• Auto-generate a unique key in schema-less
example
• Remove the restriction of adding json by only
wrapping it in an array in a new path ‘/update/
json/docs’
11. Solr Core
• Unloading/Deletion of cores that failed to initialize.
• Update request handlers are registered implicitly, no need
to define them.
• Terms Query parser for efficiently filtering documents by a
list of values.
• Json loader now flattens nested json to multiple documents.
• Correctly decode special characters in managed stopwords
and synonym endpoints.
• Facet counts are no longer duplicated in response if the
request duplicates them.
12. SolrCloud - APIs
• The CLUSTERSTATUS API tracks and returns much more than the
previous version e.g. roles, live nodes etc.
• MIGRATE Collections API
• Now works with legacyCloud=false mode
• Retrying gets better with handling of pre-existing temp
collection.
• DELETEREPLICA now removes instance and data directory by
default.
• distrib.singlePass parameter to make EXECUTE_QUERY phase
fetch all fields and skip GET_FIELDS.
• Also, other bug fixes and slightly better logging!
13. SolrCloud - Internals
• No more losing the Overseer with the
OverseerRoles enabled.
• Distributed commit and optimize are no longer
serially executed across all replicas.
• Improvements in leader initiated recovery.
• A ZooKeeper session expiry during setup can
keep LeaderElector from joining elections.
• Schemaless concurrency improvements
14. SolrCloud - Internals
• DistributedQueue is more efficient at creating
zk watches.
• Correctly decode special characters in managed
stopwords and synonym endpoints.
• OCP doesn’t exit on ZK connection loss and
other Zk communication retries.
• Bug Fixes in composite id router.
15. SolrCloud - HDFS
• Improvement in transaction log replay
performance on HDFS
• HdfsDirectoryFactory uses supplied
Configuration for communicating with secure
kerberos.
• HdfsUpdateLog has a race condition that can
expose a closed HDFS FileSystem instance.
16. SolrJ, DIH and more…
• SolrJ is better. Support for interval faceting.
• Performance improvement in C*SS - No more
spin lock.
• DIH now has onError event handler hook.
• Data Import cancel button in Admin UI
• Improvements to MailEntityProcessor
17. Optimizations
• Solr's schema now uses
DelegatingAnalyzerWrapper that uses less heap
for cached TokenStreamComponents because it
caches per FieldType not per Field.
• Reduce CPU usage by avoiding repeated costly
calls to Document.getField inside
DocumentBuilder.toDocument for use-cases with
large number of fields and copyFields.
• BinaryResponseWriter fetches unnecessary stored
fields when only pseudo-fields are requested.
18. Solr Developer? Other changes
• CoreContainer.preRegisterInZk() and CoreContainer.register()
commands are merged into CoreContainer.create().
• CoreContainer.remove() has now been replaced with
CoreContainer.unload().
• Opened up "public" access to DataSource, DocBuilder, and
EntityProcessorWrapper in DIH.
• Added support for multiple spellcheck collations, multi-valued
field highlighting to /browse UI.
• Improved SolrCloud cloud-dev scripts.
• Hardened tests so you can rely on this stuff even more!
19. • Solr 4.10.1
• Should be out anytime!
• LUCENE-5934: 4.10 broke backwards compatibility for
4.0 beta & 4.0-release indexes
• Trunk moves to Java8 after a recent vote
• Ease of use
• Performance + benchmarking
• Stability
• Analytics
What’s next?