Presented by Peter Wolanin | Acquia, Inc - See conference video - http://www.lucidimagination.com/devzone/events/conferences/lucene-revolution-2012
If you have a new web project or and existing Drupal site, the combination of Drupal and Apache Solr is both powerful and easy to set up thanks to the existing integration code. The module allows for substantial customization with the administrative UI. Drupal facilitates further customizations of the UI, indexing, and bosting because of the open architecture that provides multiple opportunities for custom code to alter the behavior. A couple code snippets will be followed by a review of other contributed Drupal modules that further enhance the search capability.
Finally, this session will showcase some example of Drupal sites using Solr including Acquia's own sites and Drupal sites including many well-known Enterprise and government sites.
Things Made Easy: One Click CMS Integration with Solr & Drupal
May 10, 2012Things Made Easy: One ClickCMS Integration with Solr &Drupal Peter M. Wolanin, Ph.D. Momentum Specialist (principal engineer), Acquia, Inc. Drupal contributor drupal.org/user/49851 co-maintainer of the Drupal Apache Solr Search Integration module
Key Questions to Be Answered• What is Drupal?• What Apache Solr features are integrated with Drupal?• Why is Drupal plus Apache Solr is better than starting from scratch?• What elements of the search can you conﬁgure in the UI without code?
Why Are You Here?• You are starting a new website project?• You are wondering how hard it is to actually integrate Apache Solr with a website?• You already use Drupal but not with Apache Solr?• You like things that are easy yet powerful?
Drupal: Web Application Framework + CMS == Social Publishing PlatformDrupal “… is as much a Social Software platformas it is a web content management system.” content usersCMS Watch, The Web CMS Report 2009 blogs / workﬂow wikis forums / taxonomy comments Content Social Mgmt Software Systems Tools social semantic web ranking RSS social tagging social analytics networks
Drupal + Solr Provides ImmediateAccess to Rich Search Features Dynamic content requires dynamic navigation - which is provided by an eﬀective search Search facets mean no dead ends Solr provides better keyword relevancy in results Much faster searches for sites with lots of content By avoiding database queries, Drupal with Solr scales better
DEMO:A Drupal 7 partial copy of the conference site with Apache Solr integration http://youtu.be/yY6kma_ViWc
Drupal Has User Accounts, Roles& Permissions Deﬁne custom roles Set granular access controls by role Conﬁgure user behavior:– Registration– Email– Proﬁles– Pictures
Drupal Modules AddFunctionality “There’s a module for that” More than 4100 Drupal 7 community modules Often controlled by role- based permissions Drupal core and modules are GPL v2+, and have a huge, active community
Drupal is Written in PHP, WhichMakes for Easy Customization The Drupal architecture encourages and provides many avenues for customization by writing modules but not patching Drupal core Drupal has a huge community of users. Approximately 10,000 sites report to Drupal.org that they use the Apache Solr Search Integration module.
Drupal Entities are Content + Data Nodes are the basic entity used for text content Node 1 Node 2 Node 3 The entity system is extensible - can represent Node 4 Node 5 Node 6 any data Examples of data stored within Drupal entities Node 7 Node 8 Node 9– Text– geographic location– Node reference
Entity Types are Enriched WithUser-conﬁgurable Data Fields Deﬁne new data ﬁelds on a node using the Field API module.– Text, images, integers, date, reference, etc Flexible and conﬁgurable in the UI No programming required (many existing modules)
Drupal + Solr Search for Business, Government and NGOs http://www.mattel.com/search/ apachesolr_search/ https://www.eff.org/search/site/ http://www.poly.edu/search/apachesolr_search/ http://www.whitehouse.gov/search/site/ http://opensource.com/search/apachesolr_search/ https://www.ethicshare.org/publications/ http://www.nypl.org/search/apachesolr_search/ http://www.mylifetime.com/community/search/apachesolr_search/ http://www.emporia.edu/search/site/http://www.restorethegulf.gov/search/apachesolr_search/ http://www.hrw.org/en/search/apachesolr_search/
Drupal Has Already Solved ManySolr Integration Challenges The most important - content indexing. Facets, sorting, and highlighting of results. Immediate integration with the More Like This and spell-check handlers. Included sub-module integrates content access permissions by indexing to and ﬁltering Solr results based on the current user.
Easy Content Recommendation! Uses the MLT handler Picks ﬁelds from the currently viewed node
The Module Has a Pipeline forIndexing Drupal Content to Solr Drupal entities are processed into one (or more) document objects. Each document object is converted to XML and sent to Solr.Node object Document object XML string entity_type <doc> title label <field <field name="entity_type">node</field> name="label">Hello Drupal</field> <field name="entity_id">101</field> nid entity_id <field </doc> name="bundle">session</field> type Drupal bundle functions
Entity Meta-data GivesAutomatic Facets! Content types Taxonomy terms per vocabulary Content authors Posted and modiﬁed dates Text and numbers selected via select list/radios/check boxes
Drupal Modules Implement hooksto Control Indexing and DisplayHOOK_apachesolr_index_document_build($document,$entity, $entity_type, $env_id) By creating a Drupal module (in PHP), you can implement module and theme “hooks” to extend or alter Drupal behavior. Change or replace the data normally indexed. Modify the search results and their appearance.
Updates to an Entity or RelatedMeta-data Cause Reindexing Drupal entities are indexed during Drupal cron (typically invoked via *nix cron). By using a specialized tracking table, content can automatically be queued for reindex when changed, and subsets of content can potentially be sent to diﬀerent Solr indexes. Entities include many ID-based reference ﬁelds (e.g. the User ID of the author). Changes to the referenced data is also watched.
Indexing Tracking Tables MaintainOrder+-------------+-----------+-------------+--------+------------+| entity_type | entity_id | bundle | status | changed |+-------------+-----------+-------------+--------+------------+| node | 36 | session | 1 | 1336520756 || node | 37 | session | 1 | 1336510489 || node | 38 | session | 1 | 1336510456 || node | 39 | session | 1 | 1336510456 || node | 40 | speaker_bio | 1 | 1336510456 |+-------------+-----------+-------------+--------+------------+ When a node is updated, the “changed” timestamp is updated. The indexing pipeline tracks the largest timestamp and entity_id which has been indexed.
Example: Taxonomy TermClassifying a Node is Changed Grapefruit Citrus fruitfunction apachesolr_taxonomy_term_update($term) All nodes classiﬁed with this terms are queued to be re-indexed by setting the “changed” column to the current time. Thus you will correctly match ‘Citrus’ instead of ‘Grapefruit’ for those documents.
When Unpublished, Content isPurged Drupal core includes a simple editorial workﬂow where content may be toggled between published (visible) and unpublished (incomplete, removed, spam, etc). The module immediately removes content from the index when unpublished, and also tracks it for future removal in case the Solr server is unavailable.
Search Using Dismax QueryParsing & Boosting Features Dynamic ﬁelds in schema.xml used to index standard and custom entity data ﬁelds Dismax (or EDismax) handler used for keyword searching across multiple ﬁelds and per-ﬁeld boosts Query-time boosting options available in the UI
A Query Object Is Used toPrepare and Run Searches HOOK_apachesolr_query_prepare($query) $query->setParam(hl.fl, $field); $keys = $query->getParam(q); $response = $query->search();
More Modules Available toAdd More FeaturesA few examples: ApacheSolr Attachments Apache Solr Multisite Search Apache Solr Organic Groups Integration Apachesolr User indexing Apachesolr Commerce
To Wrap Up ! Drupal has extensive Apache Solr integration already, and is highly customizable. The Drupal platform is widely adopted, and the Drupal community drives rapid innovation. Acquia provides Enterprise Drupal support and a network of partners. Acquia includes a secure, hosted Solr index with every support subscription.
Did I Answer These?• What is Drupal?• What Apache Solr features are integrated with Drupal?• Why is Drupal plus Apache Solr is better than starting from scratch?• What elements of the search can you conﬁgure in the UI without code?
Other PHP Integration Tools• http://www.solarium-project.org/• http://php.net/solr http://pecl.php.net/package/solr• http://code.google.com/p/solr-php-client/Caveat: don’t use serialized PHP response format in a custom integration - use JSON writer.
Acquia is Hiring!• Do you love Drupal, Solr, the LAMP stack, DevOps or anything related, and working at a fast-growing and successful startup?• Boston and Portland area U.S. ofﬁces.• Some remote opportunities as well.• Come talk to me! email@example.com pwolanin in IRC #drupal or #solr