SlideShare a Scribd company logo
1 of 20
© Copyright 2013
Intro to Search
Grant Ingersoll
CTO, LucidWorks
@gsingers
© 2013 LucidWorks
• Search is Everywhere!
• The Bar is Raised
- Keyword search is a
commodity
• Holistic view of the data
AND the users is critical
• Scalable
Search, Discovery and
Analytics are the key to
unlocking this view of
users and data
Search is dead, long live search
Documents
User
Interaction
Access
Content
Relationships
© 2013 LucidWorks
3
Search is good for…
• Traditional: Fast, fuzzy text matching across a large document
collection
• De-normalized data
- “light” relational
• Top N problems
- Key-value (top 1)
- Recommendations
- “Good enough” classification, clustering
• Faceting, slicing and dicing of enumerated data
• Spatial, spell checking, record linkage, highlighting
• NoSQL
© 2013 LucidWorks
4
Common Use Cases
• eCommerce
- Search + Recs + Analysis of users
• Knowledge Management
- Financial, transportation, pharma
• Fraud detection
• Social media
- Trend monitoring
• Information technology
- Log monitoring, analysis
• Healthcare
- DNA Analysis
© 2013 LucidWorks
http://bit.ly/get-lws
5
© 2013 LucidWorks
6
Topics
• Intros
• First 5 Minutes with LucidWorks Search (Solr++)
• Search Concepts
• Demo Deep Dive
• Level Up
• Resources
© 2013 LucidWorks
7
› Founded in 2007 to be the go-to-company for Lucene/Solr
expertise
› 250+ customers (many Fortune 500)
› 100% y-y growth
› Over 40% of the active Apache Lucene/Solr Committers
› Host fast-growing Lucene/Solr Revolution User Conference
(400+ attendees)
LucidWorks Overview
© 2013 LucidWorks
8
LucidWorks Product Suite
PRODUCT
LucidWorks Search
LucidWorks Big
Data
Description
Massively adopted open
source search
technology
Enterprise Search
platform built on
Lucene/Solr
Unified development
platform for Big Data
applications
Version
Version 4.3 released
May 2013
Version 2.5 ships
December 2012
GA Version 1.1
released Feb. 2013
LucidWorks
Offering
› Annual Support
Subscriptions
› Professional Services
› Training
› Inside Sales Model
› Free trial
› On-prem or cloud
› Inside sales model
› Free Trial
› On-prem or cloud
› Enterprise sales model
© 2013 LucidWorks
9
5 Minutes to Search
1. Install LWS
1. Unpack, double click to launch Installer
2. Launch, wait for startup
2. http://localhost:8989/
3. Choose “Quick Start”
4. Choose a Data Source
1. For me: /Users/grantingersoll/Desktop/reading
5. Quick Search
6. Search with Flare
1. http://localhost:8989/flare/catalog/quickstart
7. Quick Changes:
1. Add a Facet
2. Change Display Results
© 2013 LucidWorks
10
Prepare Deep Dive Demo
1. https://github.com/LucidWorks/lws-financial-
demo/blob/master/README.md
2. cd src/main/python
3. python setup.py -n setup -a
TWITTER_ACCESS_TOKEN -c
TWITTER_CONSUMER_KEY -s
TWITTER_CONSUMER_SECRET -t
TWITTER_ACCESS_TOKEN_SECRET -p
../../../data/sp500List-30.txt -A -l Finance --data_dir
../../../data
4. python python.py
© 2013 LucidWorks
• Java APIs for building
search applications
• Fast, efficient, flexible
• Modules to add
functionality:
- Lang. Analysis
- Faceting
- Highlighting, spell checking
- Much more
• Lucene best practices
• HTTP-based service
- Many client bindings
• Faceting
• Distributed, fault-tolerant
• Many No-SQL features
11
© 2013 LucidWorks
12
• IT Ready Open Source
- Installation, provisioning, monitoring, administration, integration
• Enterprise Grade
- A robust connector framework
» Including a wide assortment of prebuilt connectors to popular data sources
- Enterprise security framework
» Leverages SSL, LDAP, Active Directory
» Document level access control
• Business Friendly
- Rich graphical administration console
» speeds up search application development, deployment and management
- Expressive Business Logic
» Processing information thru filters for better more accurate results
- Relevancy Work Bench
• Full power of Apache Lucene and Solr
LucidWorks Search Goals
© 2013 LucidWorks
Shards
1
2
3 N
Search View
•Documents
•Users
•Logs
Document
Store
Analytic
Services
View into
numeric/hist
oric data
Classification
Recommendation
Personalization
& Machine
Learning
Services
Classification
Models
In memory
Replicated
Multi-tenant
Discovery &
Enrichment
Clustering, classifi
cation, NLP, topic
identification, sear
ch log
analysis, user
behavior Content Acquisition
ETL, batch or near
real-time
Access APIs
Data
• LucidWorks Search
connectors
• Push
Reference Architecture
© 2013 LucidWorks
14
Basic Vocab
•Documents
- Fields
»Tokens
▪ Payloads
• Query
- Many diff. kinds: term, phrase, regex, spatial, function
•Facets & Filters
•Collection
- Index
»Shard
▪ Segment
© 2013 LucidWorks
15
Search Concepts: Indexing
© 2013 LucidWorks
16
Search Concepts: Ranking
• Search is optimized for solving top
N problems
• Hand Waving Algo:
- Parse query
- For Each Term
» Look up documents containing term
- Rank documents according to
similarity
- Return top X
© 2013 LucidWorks
17
Search Concepts: Faceting
• Dynamically slice and dice query
results in a variety of ways:
- Term
- Range (date and numeric)
- Pivot
- Function
- Multi-select
• Gather Stats
© 2013 LucidWorks
18
Demo Deep Dive
• Application:
- Stock Insights
- Twitter Bootstrap + Python Flask + LWS
- http://localhost:5000
• Goals:
- Explore data sources, scheduling, other features
- Automate setup via script and LWS APIs
• Data:
- Company Info (Symbol, Company, Industry, City, State)
- Twitter, websites
- Historical Stock Prices from Y! Finance
• http://github.com/lucidworks/lws-financial-demo
- README covers setup
© 2013 LucidWorks
19
Level Up
• Explore our APIs:
- http://bit.ly/lws-apis
• Build your own UI or
extend ours
• Write a custom connector
• Customize Solr!
• Scale with SolrCloud
• Explore Solr Marketplace:
• http://bit.ly/solr-market
© 2013 LucidWorks
20
Where to Next?
• http://www.lucidworks.com
• http://lucene.apache.org/solr
• Training: http://bit.ly/lws-training
• LWS more info: http://bit.ly/lws-more-info
• LWS Documentation: http://bit.ly/lws-docs
• Twitter: @gsingers, @LucidWorks
• Taming Text: http://www.manning.com/ingersoll

More Related Content

What's hot

Webinar: Fusion for Data Science
Webinar: Fusion for Data ScienceWebinar: Fusion for Data Science
Webinar: Fusion for Data ScienceLucidworks
 
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineLucidworks (Archived)
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Lucidworks (Archived)
 
Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...
Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...
Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...OpenSource Connections
 
Use cases for cassandra in federal and state government
Use cases for cassandra in federal and state governmentUse cases for cassandra in federal and state government
Use cases for cassandra in federal and state governmentOpenSource Connections
 
Hadoop world overview trends and topics
Hadoop world overview trends and topicsHadoop world overview trends and topics
Hadoop world overview trends and topicsValentin Kropov
 
If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!gagravarr
 
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Lucidworks
 
Uber's data science workbench
Uber's data science workbenchUber's data science workbench
Uber's data science workbenchRan Wei
 
ApacheCon Europe 2012 -Big Search 4 Big Data
ApacheCon Europe 2012 -Big Search 4 Big DataApacheCon Europe 2012 -Big Search 4 Big Data
ApacheCon Europe 2012 -Big Search 4 Big DataOpenSource Connections
 
Webinar: Event Processing & Data Analytics with Lucidworks Fusion
Webinar: Event Processing & Data Analytics with Lucidworks FusionWebinar: Event Processing & Data Analytics with Lucidworks Fusion
Webinar: Event Processing & Data Analytics with Lucidworks FusionLucidworks
 
Nashville analytics summit aug9 no sql mike king dell v1.5
Nashville analytics summit aug9 no sql mike king dell v1.5Nashville analytics summit aug9 no sql mike king dell v1.5
Nashville analytics summit aug9 no sql mike king dell v1.5Mike King
 
Uof memphis nosql mike king dell v1.5 feb18
Uof memphis nosql mike king dell v1.5 feb18Uof memphis nosql mike king dell v1.5 feb18
Uof memphis nosql mike king dell v1.5 feb18Mike King
 
Data Science at Scale by Sarah Guido
Data Science at Scale by Sarah GuidoData Science at Scale by Sarah Guido
Data Science at Scale by Sarah GuidoSpark Summit
 

What's hot (20)

Webinar: Fusion for Data Science
Webinar: Fusion for Data ScienceWebinar: Fusion for Data Science
Webinar: Fusion for Data Science
 
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search EngineChicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
Chicago Solr Meetup - June 10th: This Ain't Your Parents' Search Engine
 
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
Exploration of multidimensional biomedical data in pub chem, Presented by Lia...
 
Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...
Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...
Building a Lightweight Discovery Interface for China's Patents@NYC Solr/Lucen...
 
Apache Spark in Industry
Apache Spark in IndustryApache Spark in Industry
Apache Spark in Industry
 
Integrating Hadoop & Solr
Integrating Hadoop & SolrIntegrating Hadoop & Solr
Integrating Hadoop & Solr
 
Use cases for cassandra in federal and state government
Use cases for cassandra in federal and state governmentUse cases for cassandra in federal and state government
Use cases for cassandra in federal and state government
 
Hadoop world overview trends and topics
Hadoop world overview trends and topicsHadoop world overview trends and topics
Hadoop world overview trends and topics
 
If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!If You Have The Content, Then Apache Has The Technology!
If You Have The Content, Then Apache Has The Technology!
 
OSSCON: Big Search 4 Big Data
OSSCON: Big Search 4 Big DataOSSCON: Big Search 4 Big Data
OSSCON: Big Search 4 Big Data
 
Big Search 4 Big Data War Stories
Big Search 4 Big Data War StoriesBig Search 4 Big Data War Stories
Big Search 4 Big Data War Stories
 
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
Building a Real-Time News Search Engine: Presented by Ramkumar Aiyengar, Bloo...
 
Uber's data science workbench
Uber's data science workbenchUber's data science workbench
Uber's data science workbench
 
ApacheCon Europe 2012 -Big Search 4 Big Data
ApacheCon Europe 2012 -Big Search 4 Big DataApacheCon Europe 2012 -Big Search 4 Big Data
ApacheCon Europe 2012 -Big Search 4 Big Data
 
Webinar: Event Processing & Data Analytics with Lucidworks Fusion
Webinar: Event Processing & Data Analytics with Lucidworks FusionWebinar: Event Processing & Data Analytics with Lucidworks Fusion
Webinar: Event Processing & Data Analytics with Lucidworks Fusion
 
Nashville analytics summit aug9 no sql mike king dell v1.5
Nashville analytics summit aug9 no sql mike king dell v1.5Nashville analytics summit aug9 no sql mike king dell v1.5
Nashville analytics summit aug9 no sql mike king dell v1.5
 
Indexing big data in the cloud
Indexing big data in the cloudIndexing big data in the cloud
Indexing big data in the cloud
 
Uof memphis nosql mike king dell v1.5 feb18
Uof memphis nosql mike king dell v1.5 feb18Uof memphis nosql mike king dell v1.5 feb18
Uof memphis nosql mike king dell v1.5 feb18
 
Data Science at Scale by Sarah Guido
Data Science at Scale by Sarah GuidoData Science at Scale by Sarah Guido
Data Science at Scale by Sarah Guido
 
Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...
Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...
Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...
 

Viewers also liked

Leveraging Solr and Mahout
Leveraging Solr and MahoutLeveraging Solr and Mahout
Leveraging Solr and MahoutGrant Ingersoll
 
Crowd Sourced Reflected Intelligence for Solr and Hadoop
Crowd Sourced Reflected Intelligence for Solr and HadoopCrowd Sourced Reflected Intelligence for Solr and Hadoop
Crowd Sourced Reflected Intelligence for Solr and HadoopGrant Ingersoll
 
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrLarge Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrGrant Ingersoll
 
What's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xWhat's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xGrant Ingersoll
 

Viewers also liked (6)

Apache Lucene 4
Apache Lucene 4Apache Lucene 4
Apache Lucene 4
 
Leveraging Solr and Mahout
Leveraging Solr and MahoutLeveraging Solr and Mahout
Leveraging Solr and Mahout
 
Crowd Sourced Reflected Intelligence for Solr and Hadoop
Crowd Sourced Reflected Intelligence for Solr and HadoopCrowd Sourced Reflected Intelligence for Solr and Hadoop
Crowd Sourced Reflected Intelligence for Solr and Hadoop
 
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrLarge Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
 
What's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xWhat's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.x
 
Taming Text
Taming TextTaming Text
Taming Text
 

Similar to Intro to Search

Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKLucidworks (Archived)
 
This Ain't Your Parents' Search Engine
This Ain't Your Parents' Search EngineThis Ain't Your Parents' Search Engine
This Ain't Your Parents' Search EngineLucidworks
 
Introduction to Cloudera Search Training
Introduction to Cloudera Search TrainingIntroduction to Cloudera Search Training
Introduction to Cloudera Search TrainingCloudera, Inc.
 
Islandora Webinar: Building a Repository Roadmap
Islandora Webinar: Building a Repository RoadmapIslandora Webinar: Building a Repository Roadmap
Islandora Webinar: Building a Repository Roadmapeohallor
 
Three signs your architecture is too small for big data. Camp IT December 2014
Three signs your architecture is too small for big data.  Camp IT December 2014Three signs your architecture is too small for big data.  Camp IT December 2014
Three signs your architecture is too small for big data. Camp IT December 2014Craig Jordan
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 CareerBuilder.com
 
OU Library Research Support webinar: Working with research data
OU Library Research Support webinar: Working with research dataOU Library Research Support webinar: Working with research data
OU Library Research Support webinar: Working with research dataIzzyChad
 
Large Scale Search, Discovery and Analytics in Action
Large Scale Search, Discovery and Analytics in ActionLarge Scale Search, Discovery and Analytics in Action
Large Scale Search, Discovery and Analytics in ActionGrant Ingersoll
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureDatabricks
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSPhilip Filleul
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discoverymarkgrover
 
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus WebinarBuild and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus WebinarImpetus Technologies
 
Introducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarIntroducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarLucidworks (Archived)
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDBMongoDB
 
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Cloudera, Inc.
 
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Lucidworks
 
Denodo’s Data Catalog: Bridging the Gap between Data and Business
Denodo’s Data Catalog: Bridging the Gap between Data and BusinessDenodo’s Data Catalog: Bridging the Gap between Data and Business
Denodo’s Data Catalog: Bridging the Gap between Data and BusinessDenodo
 

Similar to Intro to Search (20)

Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
 
This Ain't Your Parents' Search Engine
This Ain't Your Parents' Search EngineThis Ain't Your Parents' Search Engine
This Ain't Your Parents' Search Engine
 
Introduction to Cloudera Search Training
Introduction to Cloudera Search TrainingIntroduction to Cloudera Search Training
Introduction to Cloudera Search Training
 
Islandora Webinar: Building a Repository Roadmap
Islandora Webinar: Building a Repository RoadmapIslandora Webinar: Building a Repository Roadmap
Islandora Webinar: Building a Repository Roadmap
 
Three signs your architecture is too small for big data. Camp IT December 2014
Three signs your architecture is too small for big data.  Camp IT December 2014Three signs your architecture is too small for big data.  Camp IT December 2014
Three signs your architecture is too small for big data. Camp IT December 2014
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018
 
OU Library Research Support webinar: Working with research data
OU Library Research Support webinar: Working with research dataOU Library Research Support webinar: Working with research data
OU Library Research Support webinar: Working with research data
 
2020 | Metadata Day | LinkedIn
2020 | Metadata Day | LinkedIn2020 | Metadata Day | LinkedIn
2020 | Metadata Day | LinkedIn
 
Large Scale Search, Discovery and Analytics in Action
Large Scale Search, Discovery and Analytics in ActionLarge Scale Search, Discovery and Analytics in Action
Large Scale Search, Discovery and Analytics in Action
 
Liwp consider opensource2010
Liwp consider opensource2010Liwp consider opensource2010
Liwp consider opensource2010
 
Architect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh ArchitectureArchitect’s Open-Source Guide for a Data Mesh Architecture
Architect’s Open-Source Guide for a Data Mesh Architecture
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Bitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FSBitkom Cray presentation - on HPC affecting big data analytics in FS
Bitkom Cray presentation - on HPC affecting big data analytics in FS
 
Disrupting Data Discovery
Disrupting Data DiscoveryDisrupting Data Discovery
Disrupting Data Discovery
 
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus WebinarBuild and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar
Build and Manage Hadoop & Oracle NoSQL DB Solutions- Impetus Webinar
 
Introducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinarIntroducing LucidWorks App for Splunk Enterprise webinar
Introducing LucidWorks App for Splunk Enterprise webinar
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
 
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
Optimized Data Management with Cloudera 5.7: Understanding data value with Cl...
 
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
 
Denodo’s Data Catalog: Bridging the Gap between Data and Business
Denodo’s Data Catalog: Bridging the Gap between Data and BusinessDenodo’s Data Catalog: Bridging the Gap between Data and Business
Denodo’s Data Catalog: Bridging the Gap between Data and Business
 

More from Grant Ingersoll

Scalable Machine Learning with Hadoop
Scalable Machine Learning with HadoopScalable Machine Learning with Hadoop
Scalable Machine Learning with HadoopGrant Ingersoll
 
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrLarge Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrGrant Ingersoll
 
Bet you didn't know Lucene can...
Bet you didn't know Lucene can...Bet you didn't know Lucene can...
Bet you didn't know Lucene can...Grant Ingersoll
 
Starfish: A Self-tuning System for Big Data Analytics
Starfish: A Self-tuning System for Big Data AnalyticsStarfish: A Self-tuning System for Big Data Analytics
Starfish: A Self-tuning System for Big Data AnalyticsGrant Ingersoll
 
Intro to Mahout -- DC Hadoop
Intro to Mahout -- DC HadoopIntro to Mahout -- DC Hadoop
Intro to Mahout -- DC HadoopGrant Ingersoll
 
Intro to Apache Lucene and Solr
Intro to Apache Lucene and SolrIntro to Apache Lucene and Solr
Intro to Apache Lucene and SolrGrant Ingersoll
 
Apache Mahout: Driving the Yellow Elephant
Apache Mahout: Driving the Yellow ElephantApache Mahout: Driving the Yellow Elephant
Apache Mahout: Driving the Yellow ElephantGrant Ingersoll
 
Intelligent Apps with Apache Lucene, Mahout and Friends
Intelligent Apps with Apache Lucene, Mahout and FriendsIntelligent Apps with Apache Lucene, Mahout and Friends
Intelligent Apps with Apache Lucene, Mahout and FriendsGrant Ingersoll
 
TriHUG: Lucene Solr Hadoop
TriHUG: Lucene Solr HadoopTriHUG: Lucene Solr Hadoop
TriHUG: Lucene Solr HadoopGrant Ingersoll
 

More from Grant Ingersoll (10)

Scalable Machine Learning with Hadoop
Scalable Machine Learning with HadoopScalable Machine Learning with Hadoop
Scalable Machine Learning with Hadoop
 
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrLarge Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
 
Bet you didn't know Lucene can...
Bet you didn't know Lucene can...Bet you didn't know Lucene can...
Bet you didn't know Lucene can...
 
Starfish: A Self-tuning System for Big Data Analytics
Starfish: A Self-tuning System for Big Data AnalyticsStarfish: A Self-tuning System for Big Data Analytics
Starfish: A Self-tuning System for Big Data Analytics
 
Intro to Mahout -- DC Hadoop
Intro to Mahout -- DC HadoopIntro to Mahout -- DC Hadoop
Intro to Mahout -- DC Hadoop
 
Intro to Apache Lucene and Solr
Intro to Apache Lucene and SolrIntro to Apache Lucene and Solr
Intro to Apache Lucene and Solr
 
Apache Mahout: Driving the Yellow Elephant
Apache Mahout: Driving the Yellow ElephantApache Mahout: Driving the Yellow Elephant
Apache Mahout: Driving the Yellow Elephant
 
Intelligent Apps with Apache Lucene, Mahout and Friends
Intelligent Apps with Apache Lucene, Mahout and FriendsIntelligent Apps with Apache Lucene, Mahout and Friends
Intelligent Apps with Apache Lucene, Mahout and Friends
 
TriHUG: Lucene Solr Hadoop
TriHUG: Lucene Solr HadoopTriHUG: Lucene Solr Hadoop
TriHUG: Lucene Solr Hadoop
 
Intro to Apache Mahout
Intro to Apache MahoutIntro to Apache Mahout
Intro to Apache Mahout
 

Recently uploaded

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Intro to Search

  • 1. © Copyright 2013 Intro to Search Grant Ingersoll CTO, LucidWorks @gsingers
  • 2. © 2013 LucidWorks • Search is Everywhere! • The Bar is Raised - Keyword search is a commodity • Holistic view of the data AND the users is critical • Scalable Search, Discovery and Analytics are the key to unlocking this view of users and data Search is dead, long live search Documents User Interaction Access Content Relationships
  • 3. © 2013 LucidWorks 3 Search is good for… • Traditional: Fast, fuzzy text matching across a large document collection • De-normalized data - “light” relational • Top N problems - Key-value (top 1) - Recommendations - “Good enough” classification, clustering • Faceting, slicing and dicing of enumerated data • Spatial, spell checking, record linkage, highlighting • NoSQL
  • 4. © 2013 LucidWorks 4 Common Use Cases • eCommerce - Search + Recs + Analysis of users • Knowledge Management - Financial, transportation, pharma • Fraud detection • Social media - Trend monitoring • Information technology - Log monitoring, analysis • Healthcare - DNA Analysis
  • 6. © 2013 LucidWorks 6 Topics • Intros • First 5 Minutes with LucidWorks Search (Solr++) • Search Concepts • Demo Deep Dive • Level Up • Resources
  • 7. © 2013 LucidWorks 7 › Founded in 2007 to be the go-to-company for Lucene/Solr expertise › 250+ customers (many Fortune 500) › 100% y-y growth › Over 40% of the active Apache Lucene/Solr Committers › Host fast-growing Lucene/Solr Revolution User Conference (400+ attendees) LucidWorks Overview
  • 8. © 2013 LucidWorks 8 LucidWorks Product Suite PRODUCT LucidWorks Search LucidWorks Big Data Description Massively adopted open source search technology Enterprise Search platform built on Lucene/Solr Unified development platform for Big Data applications Version Version 4.3 released May 2013 Version 2.5 ships December 2012 GA Version 1.1 released Feb. 2013 LucidWorks Offering › Annual Support Subscriptions › Professional Services › Training › Inside Sales Model › Free trial › On-prem or cloud › Inside sales model › Free Trial › On-prem or cloud › Enterprise sales model
  • 9. © 2013 LucidWorks 9 5 Minutes to Search 1. Install LWS 1. Unpack, double click to launch Installer 2. Launch, wait for startup 2. http://localhost:8989/ 3. Choose “Quick Start” 4. Choose a Data Source 1. For me: /Users/grantingersoll/Desktop/reading 5. Quick Search 6. Search with Flare 1. http://localhost:8989/flare/catalog/quickstart 7. Quick Changes: 1. Add a Facet 2. Change Display Results
  • 10. © 2013 LucidWorks 10 Prepare Deep Dive Demo 1. https://github.com/LucidWorks/lws-financial- demo/blob/master/README.md 2. cd src/main/python 3. python setup.py -n setup -a TWITTER_ACCESS_TOKEN -c TWITTER_CONSUMER_KEY -s TWITTER_CONSUMER_SECRET -t TWITTER_ACCESS_TOKEN_SECRET -p ../../../data/sp500List-30.txt -A -l Finance --data_dir ../../../data 4. python python.py
  • 11. © 2013 LucidWorks • Java APIs for building search applications • Fast, efficient, flexible • Modules to add functionality: - Lang. Analysis - Faceting - Highlighting, spell checking - Much more • Lucene best practices • HTTP-based service - Many client bindings • Faceting • Distributed, fault-tolerant • Many No-SQL features 11
  • 12. © 2013 LucidWorks 12 • IT Ready Open Source - Installation, provisioning, monitoring, administration, integration • Enterprise Grade - A robust connector framework » Including a wide assortment of prebuilt connectors to popular data sources - Enterprise security framework » Leverages SSL, LDAP, Active Directory » Document level access control • Business Friendly - Rich graphical administration console » speeds up search application development, deployment and management - Expressive Business Logic » Processing information thru filters for better more accurate results - Relevancy Work Bench • Full power of Apache Lucene and Solr LucidWorks Search Goals
  • 13. © 2013 LucidWorks Shards 1 2 3 N Search View •Documents •Users •Logs Document Store Analytic Services View into numeric/hist oric data Classification Recommendation Personalization & Machine Learning Services Classification Models In memory Replicated Multi-tenant Discovery & Enrichment Clustering, classifi cation, NLP, topic identification, sear ch log analysis, user behavior Content Acquisition ETL, batch or near real-time Access APIs Data • LucidWorks Search connectors • Push Reference Architecture
  • 14. © 2013 LucidWorks 14 Basic Vocab •Documents - Fields »Tokens ▪ Payloads • Query - Many diff. kinds: term, phrase, regex, spatial, function •Facets & Filters •Collection - Index »Shard ▪ Segment
  • 15. © 2013 LucidWorks 15 Search Concepts: Indexing
  • 16. © 2013 LucidWorks 16 Search Concepts: Ranking • Search is optimized for solving top N problems • Hand Waving Algo: - Parse query - For Each Term » Look up documents containing term - Rank documents according to similarity - Return top X
  • 17. © 2013 LucidWorks 17 Search Concepts: Faceting • Dynamically slice and dice query results in a variety of ways: - Term - Range (date and numeric) - Pivot - Function - Multi-select • Gather Stats
  • 18. © 2013 LucidWorks 18 Demo Deep Dive • Application: - Stock Insights - Twitter Bootstrap + Python Flask + LWS - http://localhost:5000 • Goals: - Explore data sources, scheduling, other features - Automate setup via script and LWS APIs • Data: - Company Info (Symbol, Company, Industry, City, State) - Twitter, websites - Historical Stock Prices from Y! Finance • http://github.com/lucidworks/lws-financial-demo - README covers setup
  • 19. © 2013 LucidWorks 19 Level Up • Explore our APIs: - http://bit.ly/lws-apis • Build your own UI or extend ours • Write a custom connector • Customize Solr! • Scale with SolrCloud • Explore Solr Marketplace: • http://bit.ly/solr-market
  • 20. © 2013 LucidWorks 20 Where to Next? • http://www.lucidworks.com • http://lucene.apache.org/solr • Training: http://bit.ly/lws-training • LWS more info: http://bit.ly/lws-more-info • LWS Documentation: http://bit.ly/lws-docs • Twitter: @gsingers, @LucidWorks • Taming Text: http://www.manning.com/ingersoll

Editor's Notes

  1. The bar is raised: when we first started Lucid, the problems were all around standing up Lucene or Solr or dealing with performance issues, now the large majority of them are around taking search to the next level: better relevance, personalization, recommendations, etc., i.e. how to have better relevance
  2. What is Lucene?What is Solr?
  3. Service-Oriented ArchitectureStatelessFailover/Fault TolerantLightweight Coordination and MessagingSmart about UpdatesDocument store isDistributedScalableAnalysisBatchNear Real-Time