SlideShare a Scribd company logo
1 of 25
Confidential © Copyright 2012
Crowd Sourced Intelligence
Built into Search and Hadoop
Grant Ingersoll, LucidWorks, @gsingers
Credits to: Ted Dunning, MapR, @ted_dunning
Confidential and Proprietary
© 2012 LucidWorks2
Is Search Enough?
Confidential and Proprietary
© 2012 LucidWorks
Is Search Enough?
• Keyword search is a commodity
• Holistic view of the data and the user interactions with
that data are critical
• Search, Discovery and Analytics are the key to
unlocking this view of users and data
Confidential and Proprietary
© 2012 LucidWorks
Agenda
• Intro
• Search (R)evolution
• Reflected Intelligence Use Cases
• Building a Next Generation Search and Discovery
Platform
- LucidWorks
• Easy Technical Wins
• 1+1=3
Confidential and Proprietary
© 2012 LucidWorks
User Interactions With Big Data
Data
Data
Data
DFS
Key
Value
Store
Index
Command
Line
Query
Language
Keyword
Search
System
Administrator
Engineer
End User
Confidential and Proprietary
© 2012 LucidWorks
User Interactions With Big Data
Data
Data
Data
DFS
Key
Value
Store
Index
Command
Line
Query
Language
Keyword
Search
System
Administrator
Engineer
End User
Reflected
Intelligence
Confidential and Proprietary
© 2012 LucidWorks
Search (R)evolution
• Search use leads to search abuse
- denormalization frees your mind
- scoring is just a sparse matrix multiply
• Lucene/Solr evolution
- non free text usages abound
- many DB-like features
- noSQL before NoSQL was cool
- flexible indexing
- finite State Transducers FTW!
• Scale
• “This ain’t your father’s relevance anymore”
Confidential and Proprietary
© 2012 LucidWorks
Search, Discovery and Analytics
• Large-scale analysis is key to reflected intelligence
- correlation analysis
» based on queries, clicks, mouse tracks,
even explicit feedback
» produce clusters, trends, topics, SIP’s
- start with engineered knowledge,
refine with user feedback
• Large-scale discovery features
encourage experimentation
• Always test, always enrich!
Search
DiscoveryAnalytics
Confidential and Proprietary
© 2012 LucidWorks
Social Media Analysis in Telecom
• Goal
- Detect flash-mob traffic events
- Provision additional resources before failures
• Method: Correlate mobile traffic analysis with social
media analysis
- events cause traffic micro-bursts
- participants tweet the events ahead of time
- tweet locations converge on burst location
• Deploy operations faster to predict outages and better
handle emergency situations
- high cost bandwidth augmentation can be marshaled as the
traffic appears
- anticipation beats reaction
Confidential and Proprietary
© 2012 LucidWorks
Provenance is 80% of Value
• Problem
- Broadcasters don’t know what audiences really like at a micro
level
• Method:
- Analysis of social media to determine advertising reach and
response
- Time resolution of social traffic can provide detailed response
metrics
• Results:
- In one case the untargeted advertising was worth 5x more if with
supporting response data
Confidential and Proprietary
© 2012 LucidWorks
Claims Analysis
• Goal
- Insurance claims processing and analysis
- fraud analysis
• Method
- Combine free text search with metadata analysis to identify high
risk activities across the country
- Integrate with corporate workflows to detect and fix outliers in
customer relations
• Results
- Questions that took 24-48 hours now take seconds to answer
Confidential and Proprietary
© 2012 LucidWorks
Can Search Catch the Bad Guys?
• Online Drug Counterfeit
detection
• Identify commonly used
language indicating
counterfeits
- you know it when you see it
- and you know you have seen it
• Leverage:
- Statistically Interesting Phrases
- Clustering
- Other Analysts
• Feed to analyst via search-
driven application
- enrich based on analysts
feedback
Confidential and Proprietary
© 2012 LucidWorks
Learn to Rank
• Go beyond TF/IDF by leveraging user votes
• Log all clicks per query
• Periodically process the logs to determine most
popular items per query
• “Update” Lucene index underneath the hood with query
X boost factors
- Alternatively: train a classifier to learn rankings
- Beware of self-fulfilling results!
• Profit!
Confidential and Proprietary
© 2012 LucidWorks
Via ParallelReader
• Pros:
- All click data (e.g. searchable labels) can be added
• Cons:
- Complicated and fragile (rebuild on every update)
» Though only the click index needs a rebuild
- No tools to manage this parallel index in Solr
14
D4
D2
D6
D1
D3
D5
1
2
3
4
5
6
D4
D2
D6
D1
D3
D5
f1, f2, ...
f1, f2, ...
f1, f2, ...
f1, f2, ...
f1, f2, ...
f1, f2, …
D1
D2
D3
D4
D5
D6
c1, c2, ...
c1, c2, ...
c1, c2, ...
c1, c2, ...
c1, c2, ...
c1, c2,…
click data main index
1
2
3
4
5
6
D4
D2
D6
D1
D3
D5
f1, f2, ...
f1, f2, ...
f1, f2, ...
f1, f2, ...
f1, f2, ...
f1, f2, …
c1, c2, ...
c1, c2, ...
c1, c2, ...
c1, c2, ...
c1, c2, ...
c1, c2,…
Confidential and Proprietary
© 2012 LucidWorks
Virginia Tech - Help the World
• Grab data around crisis
- Crowd sourced from Twitter, etc.
• Search immediately
• Large-scale analysis enriches data to find ways to
improve responses and understanding
• http://www.ctrnet.net
Confidential and Proprietary
© 2012 LucidWorks
Veoh - Cross Recommendations
• Cross recommendation as search
- with search used to build cross recommendation!
• Recommend content to people who exhibit certain
behaviors (clicks, query terms, other)
• (Ab)use of a search engine
- but not as a search engine for content
- more like a search engine for behavior
Confidential and Proprietary
© 2012 LucidWorks
Recommendation Basics
• See Ted’s talk from this morning on Multi-modal
Recommendation Algorithms
• Go get Mahout/Myrrix or just do it in y(our) search
engine
Confidential and Proprietary
© 2012 LucidWorks
Search Engine for Reflected Intelligence
• Map-reduce “big data” part
- Logs record user + item occurrence
- Group by user to get rows of occurrence matrix
- Self-join to get co-occurrence
- Log-likelihood test to find anomalies
• Search part
- Anomalous cooccurrences are indicators
(or use statistical scores to provide fancy boosts)
- Indicator fields and other meta-data are indexed
- Recommendation implemented using a single search
- Boosts, functions, similarity also can reflect learned behavior
Confidential and Proprietary
© 2012 LucidWorks
What Platform Do You Need?
• Fast, efficient, scalable search
- bulk and near real-time indexing
- handle billions of records with sub-second search and faceting
• Large scale, cost effective storage and processing capabilities
• NLP and machine learning tools that scale to enhance discovery and
analysis
• Integrated log analysis workflows that close the loop between the raw
data and user interactions
• Easy API access with support for programming language of their
choice
• Content acquisition across a variety of enterprise, Internet and social
connectors
Confidential and Proprietary
© 2012 LucidWorks
Shards
1
2
3 N
Search View
•Documents
•Users
•Logs
Document
Store
Analytic
Services
View into
numeric/hist
oric data
Classification
Recommendation
Personalization
& Machine
Learning
Services
Classification
Models
In memory
Replicated
Multi-tenant
Discovery &
Enrichment
Clustering, classifi
cation, NLP, topic
identification, sear
ch log
analysis, user
behavior Content Acquisition
ETL, batch or near
real-time
Access APIs
Data
• LucidWorks Search
connectors
• Push
Reference Architecture
Confidential and Proprietary
© 2012 LucidWorks
LucidWorks
• LucidWorks provides the leading packaging of Apache
Lucene and Solr
- build your own, we support
- founded by the many prominent Lucene/Solr experts
• LucidWorks Search
- “Solr++”
» UI, REST API, MapR connectors, relevance tools, much more
• LucidWorks Big Data
- Big Data as a Service
- Integrated LucidWorks Search, Hadoop, machine learning with
prebuilt workflows for many of these tasks
Confidential and Proprietary
© 2012 LucidWorks
LucidWorks Big Data
Inputs
API
MgmtSearch, Discovery, Analytics
Processing & Storage
Analytics Service Document Service
Big Data LucidWorks Web HDFS
Admin
Service
Mgmt
Data
Mgmt
Provisioning, Monitoring & Configuration
Confidential and Proprietary
© 2012 LucidWorks
Easy Technical Wins
• Analyze logs from application stored in Hadoop/MapR
• Seamlessly store search indexes in Hadoop/MapR
- and feed to Pig, Mahout and others
- use mirrors + NFS to directly deploy indexes
• LucidWorks 2.5 easily connects with Hadoop/MapR
- Click ranking, other log analysis built in
- Classification as service
- Offline Enrichment
Confidential and Proprietary
© 2012 LucidWorks
1 + 1 = 3
Confidential and Proprietary
© 2012 LucidWorks
Learn More
• Talk to Grant
@gsingers
grant@lucidworks.com
• LucidWorks
- http://www.lucidworks.com
- Hash Tags
» #lucene #solr #lucidworks
• Additional credits to:
- Ted Dunning (MapR, @ted_dunning) for participation in prior
talks!

More Related Content

What's hot

Webinar: Is Spark Hadoop's Friend or Foe?
Webinar: Is Spark Hadoop's Friend or Foe? Webinar: Is Spark Hadoop's Friend or Foe?
Webinar: Is Spark Hadoop's Friend or Foe? Zaloni
 
Splunk hunkbeta
Splunk hunkbetaSplunk hunkbeta
Splunk hunkbetaAhnku Toh
 
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...DataStax
 
Deep Learning in Security—An Empirical Example in User and Entity Behavior An...
Deep Learning in Security—An Empirical Example in User and Entity Behavior An...Deep Learning in Security—An Empirical Example in User and Entity Behavior An...
Deep Learning in Security—An Empirical Example in User and Entity Behavior An...Databricks
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Casesboorad
 
Spectator to Participant. Contributing to Cassandra (Patrick McFadin, DataSta...
Spectator to Participant. Contributing to Cassandra (Patrick McFadin, DataSta...Spectator to Participant. Contributing to Cassandra (Patrick McFadin, DataSta...
Spectator to Participant. Contributing to Cassandra (Patrick McFadin, DataSta...DataStax
 
20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design Patterns20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design PatternsAllen Day, PhD
 
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...Cloudera, Inc.
 
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoopFighting cyber fraud with hadoop
Fighting cyber fraud with hadoopNiel Dunnage
 
Continuous Data Ingestion pipeline for the Enterprise
Continuous Data Ingestion pipeline for the EnterpriseContinuous Data Ingestion pipeline for the Enterprise
Continuous Data Ingestion pipeline for the EnterpriseDataWorks Summit
 
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databases
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databasesGive sense to your Big Data w/ Apache TinkerPop™ & property graph databases
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databasesDataStax
 
Big data on Azure for Architects
Big data on Azure for ArchitectsBig data on Azure for Architects
Big data on Azure for ArchitectsTomasz Kopacz
 
Introduction to Graph databases and Neo4j (by Stefan Armbruster)
Introduction to Graph databases and Neo4j (by Stefan Armbruster)Introduction to Graph databases and Neo4j (by Stefan Armbruster)
Introduction to Graph databases and Neo4j (by Stefan Armbruster)barcelonajug
 
Data Tools and the Data Scientist Shortage
Data Tools and the Data Scientist ShortageData Tools and the Data Scientist Shortage
Data Tools and the Data Scientist ShortageWes McKinney
 
How big data and AI saved the day: critical IP almost walked out the door
How big data and AI saved the day: critical IP almost walked out the doorHow big data and AI saved the day: critical IP almost walked out the door
How big data and AI saved the day: critical IP almost walked out the doorDataWorks Summit
 
Marketing Digital Command Center
Marketing Digital Command CenterMarketing Digital Command Center
Marketing Digital Command CenterDataWorks Summit
 

What's hot (20)

Smart data for a predictive bank
Smart data for a predictive bankSmart data for a predictive bank
Smart data for a predictive bank
 
Webinar: Is Spark Hadoop's Friend or Foe?
Webinar: Is Spark Hadoop's Friend or Foe? Webinar: Is Spark Hadoop's Friend or Foe?
Webinar: Is Spark Hadoop's Friend or Foe?
 
Splunk hunkbeta
Splunk hunkbetaSplunk hunkbeta
Splunk hunkbeta
 
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
Bloor Research & DataStax: How graph databases solve previously unsolvable bu...
 
Deep Learning in Security—An Empirical Example in User and Entity Behavior An...
Deep Learning in Security—An Empirical Example in User and Entity Behavior An...Deep Learning in Security—An Empirical Example in User and Entity Behavior An...
Deep Learning in Security—An Empirical Example in User and Entity Behavior An...
 
Big Data Use Cases
Big Data Use CasesBig Data Use Cases
Big Data Use Cases
 
Spectator to Participant. Contributing to Cassandra (Patrick McFadin, DataSta...
Spectator to Participant. Contributing to Cassandra (Patrick McFadin, DataSta...Spectator to Participant. Contributing to Cassandra (Patrick McFadin, DataSta...
Spectator to Participant. Contributing to Cassandra (Patrick McFadin, DataSta...
 
20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design Patterns20131011 - Los Gatos - Netflix - Big Data Design Patterns
20131011 - Los Gatos - Netflix - Big Data Design Patterns
 
Big Data with Azure
Big Data with AzureBig Data with Azure
Big Data with Azure
 
Big Data Application Architectures - IoT
Big Data Application Architectures - IoTBig Data Application Architectures - IoT
Big Data Application Architectures - IoT
 
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
Limitless Data, Rapid Discovery, Powerful Insight: How to Connect Cloudera to...
 
Fighting cyber fraud with hadoop
Fighting cyber fraud with hadoopFighting cyber fraud with hadoop
Fighting cyber fraud with hadoop
 
Continuous Data Ingestion pipeline for the Enterprise
Continuous Data Ingestion pipeline for the EnterpriseContinuous Data Ingestion pipeline for the Enterprise
Continuous Data Ingestion pipeline for the Enterprise
 
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databases
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databasesGive sense to your Big Data w/ Apache TinkerPop™ & property graph databases
Give sense to your Big Data w/ Apache TinkerPop™ & property graph databases
 
Big data on Azure for Architects
Big data on Azure for ArchitectsBig data on Azure for Architects
Big data on Azure for Architects
 
Introduction to Graph databases and Neo4j (by Stefan Armbruster)
Introduction to Graph databases and Neo4j (by Stefan Armbruster)Introduction to Graph databases and Neo4j (by Stefan Armbruster)
Introduction to Graph databases and Neo4j (by Stefan Armbruster)
 
Data Tools and the Data Scientist Shortage
Data Tools and the Data Scientist ShortageData Tools and the Data Scientist Shortage
Data Tools and the Data Scientist Shortage
 
How big data and AI saved the day: critical IP almost walked out the door
How big data and AI saved the day: critical IP almost walked out the doorHow big data and AI saved the day: critical IP almost walked out the door
How big data and AI saved the day: critical IP almost walked out the door
 
Azure Big data
Azure Big data Azure Big data
Azure Big data
 
Marketing Digital Command Center
Marketing Digital Command CenterMarketing Digital Command Center
Marketing Digital Command Center
 

Viewers also liked

Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrLarge Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrGrant Ingersoll
 
Leveraging Solr and Mahout
Leveraging Solr and MahoutLeveraging Solr and Mahout
Leveraging Solr and MahoutGrant Ingersoll
 
Data IO: Next Generation Search with Lucene and Solr 4
Data IO: Next Generation Search with Lucene and Solr 4Data IO: Next Generation Search with Lucene and Solr 4
Data IO: Next Generation Search with Lucene and Solr 4Grant Ingersoll
 
OpenSearchLab and the Lucene Ecosystem
OpenSearchLab and the Lucene EcosystemOpenSearchLab and the Lucene Ecosystem
OpenSearchLab and the Lucene EcosystemGrant Ingersoll
 
What's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xWhat's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xGrant Ingersoll
 
This Ain't Your Parent's Search Engine
This Ain't Your Parent's Search EngineThis Ain't Your Parent's Search Engine
This Ain't Your Parent's Search EngineGrant Ingersoll
 

Viewers also liked (11)

Intro to Search
Intro to SearchIntro to Search
Intro to Search
 
Open Source Search FTW
Open Source Search FTWOpen Source Search FTW
Open Source Search FTW
 
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrLarge Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
 
Leveraging Solr and Mahout
Leveraging Solr and MahoutLeveraging Solr and Mahout
Leveraging Solr and Mahout
 
Apache Lucene 4
Apache Lucene 4Apache Lucene 4
Apache Lucene 4
 
Data IO: Next Generation Search with Lucene and Solr 4
Data IO: Next Generation Search with Lucene and Solr 4Data IO: Next Generation Search with Lucene and Solr 4
Data IO: Next Generation Search with Lucene and Solr 4
 
OpenSearchLab and the Lucene Ecosystem
OpenSearchLab and the Lucene EcosystemOpenSearchLab and the Lucene Ecosystem
OpenSearchLab and the Lucene Ecosystem
 
What's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.xWhat's new in Lucene and Solr 4.x
What's new in Lucene and Solr 4.x
 
This Ain't Your Parent's Search Engine
This Ain't Your Parent's Search EngineThis Ain't Your Parent's Search Engine
This Ain't Your Parent's Search Engine
 
Solr for Data Science
Solr for Data ScienceSolr for Data Science
Solr for Data Science
 
Taming Text
Taming TextTaming Text
Taming Text
 

Similar to Crowd Sourced Reflected Intelligence for Solr and Hadoop

Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKLucidworks (Archived)
 
Big Data IDEA 101 2019
Big Data IDEA 101 2019Big Data IDEA 101 2019
Big Data IDEA 101 2019Adam Doyle
 
Large Scale Search, Discovery and Analytics in Action
Large Scale Search, Discovery and Analytics in ActionLarge Scale Search, Discovery and Analytics in Action
Large Scale Search, Discovery and Analytics in ActionGrant Ingersoll
 
Hadoop Summit EU - Crowd Sourcing Reflected Intelligence
Hadoop Summit EU - Crowd Sourcing Reflected IntelligenceHadoop Summit EU - Crowd Sourcing Reflected Intelligence
Hadoop Summit EU - Crowd Sourcing Reflected IntelligenceMapR Technologies
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 CareerBuilder.com
 
Crowd sourced intelligence built into search over hadoop
Crowd sourced intelligence built into search over hadoopCrowd sourced intelligence built into search over hadoop
Crowd sourced intelligence built into search over hadooplucenerevolution
 
Spark: Building an application from Start to Finish
Spark: Building an application from Start to FinishSpark: Building an application from Start to Finish
Spark: Building an application from Start to FinishAdam Doyle
 
Market Research Meets Big Data Analytics for Business Transformation
Market Research Meets Big Data Analytics  for Business Transformation Market Research Meets Big Data Analytics  for Business Transformation
Market Research Meets Big Data Analytics for Business Transformation Sally Sadosky
 
MapR and Lucidworks Joint Webinar 2012
MapR and Lucidworks Joint Webinar 2012MapR and Lucidworks Joint Webinar 2012
MapR and Lucidworks Joint Webinar 2012MapR Technologies
 
Neo4j GraphDay Seattle- Sept19- Connected data imperative
Neo4j GraphDay Seattle- Sept19- Connected data imperativeNeo4j GraphDay Seattle- Sept19- Connected data imperative
Neo4j GraphDay Seattle- Sept19- Connected data imperativeNeo4j
 
Einstieg in Neo4j Graph Data Science
Einstieg in Neo4j Graph Data ScienceEinstieg in Neo4j Graph Data Science
Einstieg in Neo4j Graph Data ScienceNeo4j
 
Unlocking New Insights with Information Discovery
Unlocking New Insights with Information DiscoveryUnlocking New Insights with Information Discovery
Unlocking New Insights with Information DiscoveryAlithya
 
Data anywhere anytime
Data anywhere anytimeData anywhere anytime
Data anywhere anytimepatmisasi
 
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012Bhaskar Ghosh
 
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data ScienceGet Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data ScienceNeo4j
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolutionitnewsafrica
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...SEAD
 
Has Traditional MDM Finally Met its Match?
Has Traditional MDM Finally Met its Match?Has Traditional MDM Finally Met its Match?
Has Traditional MDM Finally Met its Match?Inside Analysis
 

Similar to Crowd Sourced Reflected Intelligence for Solr and Hadoop (20)

Building a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLKBuilding a data driven search application with LucidWorks SiLK
Building a data driven search application with LucidWorks SiLK
 
Big Data IDEA 101 2019
Big Data IDEA 101 2019Big Data IDEA 101 2019
Big Data IDEA 101 2019
 
Large Scale Search, Discovery and Analytics in Action
Large Scale Search, Discovery and Analytics in ActionLarge Scale Search, Discovery and Analytics in Action
Large Scale Search, Discovery and Analytics in Action
 
Hadoop Summit EU - Crowd Sourcing Reflected Intelligence
Hadoop Summit EU - Crowd Sourcing Reflected IntelligenceHadoop Summit EU - Crowd Sourcing Reflected Intelligence
Hadoop Summit EU - Crowd Sourcing Reflected Intelligence
 
SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018 SDSC18 and DSATL Meetup March 2018
SDSC18 and DSATL Meetup March 2018
 
Crowd sourced intelligence built into search over hadoop
Crowd sourced intelligence built into search over hadoopCrowd sourced intelligence built into search over hadoop
Crowd sourced intelligence built into search over hadoop
 
Spark: Building an application from Start to Finish
Spark: Building an application from Start to FinishSpark: Building an application from Start to Finish
Spark: Building an application from Start to Finish
 
Market Research Meets Big Data Analytics for Business Transformation
Market Research Meets Big Data Analytics  for Business Transformation Market Research Meets Big Data Analytics  for Business Transformation
Market Research Meets Big Data Analytics for Business Transformation
 
MapR and Lucidworks Joint Webinar 2012
MapR and Lucidworks Joint Webinar 2012MapR and Lucidworks Joint Webinar 2012
MapR and Lucidworks Joint Webinar 2012
 
Neo4j GraphDay Seattle- Sept19- Connected data imperative
Neo4j GraphDay Seattle- Sept19- Connected data imperativeNeo4j GraphDay Seattle- Sept19- Connected data imperative
Neo4j GraphDay Seattle- Sept19- Connected data imperative
 
Einstieg in Neo4j Graph Data Science
Einstieg in Neo4j Graph Data ScienceEinstieg in Neo4j Graph Data Science
Einstieg in Neo4j Graph Data Science
 
Unlocking New Insights with Information Discovery
Unlocking New Insights with Information DiscoveryUnlocking New Insights with Information Discovery
Unlocking New Insights with Information Discovery
 
Data anywhere anytime
Data anywhere anytimeData anywhere anytime
Data anywhere anytime
 
Liwp consider opensource2010
Liwp consider opensource2010Liwp consider opensource2010
Liwp consider opensource2010
 
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
Bg linkedin bigdata_martinschultz_symposium_yale_oct2012
 
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data ScienceGet Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
Get Started with the Most Advanced Edition Yet of Neo4j Graph Data Science
 
Big Data Evolution
Big Data EvolutionBig Data Evolution
Big Data Evolution
 
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
Changing the Curation Equation: A Data Lifecycle Approach to Lowering Costs a...
 
Has Traditional MDM Finally Met its Match?
Has Traditional MDM Finally Met its Match?Has Traditional MDM Finally Met its Match?
Has Traditional MDM Finally Met its Match?
 
Keyrus US Information
Keyrus US InformationKeyrus US Information
Keyrus US Information
 

More from Grant Ingersoll

Scalable Machine Learning with Hadoop
Scalable Machine Learning with HadoopScalable Machine Learning with Hadoop
Scalable Machine Learning with HadoopGrant Ingersoll
 
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrLarge Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrGrant Ingersoll
 
Bet you didn't know Lucene can...
Bet you didn't know Lucene can...Bet you didn't know Lucene can...
Bet you didn't know Lucene can...Grant Ingersoll
 
Starfish: A Self-tuning System for Big Data Analytics
Starfish: A Self-tuning System for Big Data AnalyticsStarfish: A Self-tuning System for Big Data Analytics
Starfish: A Self-tuning System for Big Data AnalyticsGrant Ingersoll
 
Intro to Mahout -- DC Hadoop
Intro to Mahout -- DC HadoopIntro to Mahout -- DC Hadoop
Intro to Mahout -- DC HadoopGrant Ingersoll
 
Intro to Apache Lucene and Solr
Intro to Apache Lucene and SolrIntro to Apache Lucene and Solr
Intro to Apache Lucene and SolrGrant Ingersoll
 
Apache Mahout: Driving the Yellow Elephant
Apache Mahout: Driving the Yellow ElephantApache Mahout: Driving the Yellow Elephant
Apache Mahout: Driving the Yellow ElephantGrant Ingersoll
 
Intelligent Apps with Apache Lucene, Mahout and Friends
Intelligent Apps with Apache Lucene, Mahout and FriendsIntelligent Apps with Apache Lucene, Mahout and Friends
Intelligent Apps with Apache Lucene, Mahout and FriendsGrant Ingersoll
 
TriHUG: Lucene Solr Hadoop
TriHUG: Lucene Solr HadoopTriHUG: Lucene Solr Hadoop
TriHUG: Lucene Solr HadoopGrant Ingersoll
 

More from Grant Ingersoll (10)

Scalable Machine Learning with Hadoop
Scalable Machine Learning with HadoopScalable Machine Learning with Hadoop
Scalable Machine Learning with Hadoop
 
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and SolrLarge Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
 
Bet you didn't know Lucene can...
Bet you didn't know Lucene can...Bet you didn't know Lucene can...
Bet you didn't know Lucene can...
 
Starfish: A Self-tuning System for Big Data Analytics
Starfish: A Self-tuning System for Big Data AnalyticsStarfish: A Self-tuning System for Big Data Analytics
Starfish: A Self-tuning System for Big Data Analytics
 
Intro to Mahout -- DC Hadoop
Intro to Mahout -- DC HadoopIntro to Mahout -- DC Hadoop
Intro to Mahout -- DC Hadoop
 
Intro to Apache Lucene and Solr
Intro to Apache Lucene and SolrIntro to Apache Lucene and Solr
Intro to Apache Lucene and Solr
 
Apache Mahout: Driving the Yellow Elephant
Apache Mahout: Driving the Yellow ElephantApache Mahout: Driving the Yellow Elephant
Apache Mahout: Driving the Yellow Elephant
 
Intelligent Apps with Apache Lucene, Mahout and Friends
Intelligent Apps with Apache Lucene, Mahout and FriendsIntelligent Apps with Apache Lucene, Mahout and Friends
Intelligent Apps with Apache Lucene, Mahout and Friends
 
TriHUG: Lucene Solr Hadoop
TriHUG: Lucene Solr HadoopTriHUG: Lucene Solr Hadoop
TriHUG: Lucene Solr Hadoop
 
Intro to Apache Mahout
Intro to Apache MahoutIntro to Apache Mahout
Intro to Apache Mahout
 

Recently uploaded

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Recently uploaded (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Crowd Sourced Reflected Intelligence for Solr and Hadoop

  • 1. Confidential © Copyright 2012 Crowd Sourced Intelligence Built into Search and Hadoop Grant Ingersoll, LucidWorks, @gsingers Credits to: Ted Dunning, MapR, @ted_dunning
  • 2. Confidential and Proprietary © 2012 LucidWorks2 Is Search Enough?
  • 3. Confidential and Proprietary © 2012 LucidWorks Is Search Enough? • Keyword search is a commodity • Holistic view of the data and the user interactions with that data are critical • Search, Discovery and Analytics are the key to unlocking this view of users and data
  • 4. Confidential and Proprietary © 2012 LucidWorks Agenda • Intro • Search (R)evolution • Reflected Intelligence Use Cases • Building a Next Generation Search and Discovery Platform - LucidWorks • Easy Technical Wins • 1+1=3
  • 5. Confidential and Proprietary © 2012 LucidWorks User Interactions With Big Data Data Data Data DFS Key Value Store Index Command Line Query Language Keyword Search System Administrator Engineer End User
  • 6. Confidential and Proprietary © 2012 LucidWorks User Interactions With Big Data Data Data Data DFS Key Value Store Index Command Line Query Language Keyword Search System Administrator Engineer End User Reflected Intelligence
  • 7. Confidential and Proprietary © 2012 LucidWorks Search (R)evolution • Search use leads to search abuse - denormalization frees your mind - scoring is just a sparse matrix multiply • Lucene/Solr evolution - non free text usages abound - many DB-like features - noSQL before NoSQL was cool - flexible indexing - finite State Transducers FTW! • Scale • “This ain’t your father’s relevance anymore”
  • 8. Confidential and Proprietary © 2012 LucidWorks Search, Discovery and Analytics • Large-scale analysis is key to reflected intelligence - correlation analysis » based on queries, clicks, mouse tracks, even explicit feedback » produce clusters, trends, topics, SIP’s - start with engineered knowledge, refine with user feedback • Large-scale discovery features encourage experimentation • Always test, always enrich! Search DiscoveryAnalytics
  • 9. Confidential and Proprietary © 2012 LucidWorks Social Media Analysis in Telecom • Goal - Detect flash-mob traffic events - Provision additional resources before failures • Method: Correlate mobile traffic analysis with social media analysis - events cause traffic micro-bursts - participants tweet the events ahead of time - tweet locations converge on burst location • Deploy operations faster to predict outages and better handle emergency situations - high cost bandwidth augmentation can be marshaled as the traffic appears - anticipation beats reaction
  • 10. Confidential and Proprietary © 2012 LucidWorks Provenance is 80% of Value • Problem - Broadcasters don’t know what audiences really like at a micro level • Method: - Analysis of social media to determine advertising reach and response - Time resolution of social traffic can provide detailed response metrics • Results: - In one case the untargeted advertising was worth 5x more if with supporting response data
  • 11. Confidential and Proprietary © 2012 LucidWorks Claims Analysis • Goal - Insurance claims processing and analysis - fraud analysis • Method - Combine free text search with metadata analysis to identify high risk activities across the country - Integrate with corporate workflows to detect and fix outliers in customer relations • Results - Questions that took 24-48 hours now take seconds to answer
  • 12. Confidential and Proprietary © 2012 LucidWorks Can Search Catch the Bad Guys? • Online Drug Counterfeit detection • Identify commonly used language indicating counterfeits - you know it when you see it - and you know you have seen it • Leverage: - Statistically Interesting Phrases - Clustering - Other Analysts • Feed to analyst via search- driven application - enrich based on analysts feedback
  • 13. Confidential and Proprietary © 2012 LucidWorks Learn to Rank • Go beyond TF/IDF by leveraging user votes • Log all clicks per query • Periodically process the logs to determine most popular items per query • “Update” Lucene index underneath the hood with query X boost factors - Alternatively: train a classifier to learn rankings - Beware of self-fulfilling results! • Profit!
  • 14. Confidential and Proprietary © 2012 LucidWorks Via ParallelReader • Pros: - All click data (e.g. searchable labels) can be added • Cons: - Complicated and fragile (rebuild on every update) » Though only the click index needs a rebuild - No tools to manage this parallel index in Solr 14 D4 D2 D6 D1 D3 D5 1 2 3 4 5 6 D4 D2 D6 D1 D3 D5 f1, f2, ... f1, f2, ... f1, f2, ... f1, f2, ... f1, f2, ... f1, f2, … D1 D2 D3 D4 D5 D6 c1, c2, ... c1, c2, ... c1, c2, ... c1, c2, ... c1, c2, ... c1, c2,… click data main index 1 2 3 4 5 6 D4 D2 D6 D1 D3 D5 f1, f2, ... f1, f2, ... f1, f2, ... f1, f2, ... f1, f2, ... f1, f2, … c1, c2, ... c1, c2, ... c1, c2, ... c1, c2, ... c1, c2, ... c1, c2,…
  • 15. Confidential and Proprietary © 2012 LucidWorks Virginia Tech - Help the World • Grab data around crisis - Crowd sourced from Twitter, etc. • Search immediately • Large-scale analysis enriches data to find ways to improve responses and understanding • http://www.ctrnet.net
  • 16. Confidential and Proprietary © 2012 LucidWorks Veoh - Cross Recommendations • Cross recommendation as search - with search used to build cross recommendation! • Recommend content to people who exhibit certain behaviors (clicks, query terms, other) • (Ab)use of a search engine - but not as a search engine for content - more like a search engine for behavior
  • 17. Confidential and Proprietary © 2012 LucidWorks Recommendation Basics • See Ted’s talk from this morning on Multi-modal Recommendation Algorithms • Go get Mahout/Myrrix or just do it in y(our) search engine
  • 18. Confidential and Proprietary © 2012 LucidWorks Search Engine for Reflected Intelligence • Map-reduce “big data” part - Logs record user + item occurrence - Group by user to get rows of occurrence matrix - Self-join to get co-occurrence - Log-likelihood test to find anomalies • Search part - Anomalous cooccurrences are indicators (or use statistical scores to provide fancy boosts) - Indicator fields and other meta-data are indexed - Recommendation implemented using a single search - Boosts, functions, similarity also can reflect learned behavior
  • 19. Confidential and Proprietary © 2012 LucidWorks What Platform Do You Need? • Fast, efficient, scalable search - bulk and near real-time indexing - handle billions of records with sub-second search and faceting • Large scale, cost effective storage and processing capabilities • NLP and machine learning tools that scale to enhance discovery and analysis • Integrated log analysis workflows that close the loop between the raw data and user interactions • Easy API access with support for programming language of their choice • Content acquisition across a variety of enterprise, Internet and social connectors
  • 20. Confidential and Proprietary © 2012 LucidWorks Shards 1 2 3 N Search View •Documents •Users •Logs Document Store Analytic Services View into numeric/hist oric data Classification Recommendation Personalization & Machine Learning Services Classification Models In memory Replicated Multi-tenant Discovery & Enrichment Clustering, classifi cation, NLP, topic identification, sear ch log analysis, user behavior Content Acquisition ETL, batch or near real-time Access APIs Data • LucidWorks Search connectors • Push Reference Architecture
  • 21. Confidential and Proprietary © 2012 LucidWorks LucidWorks • LucidWorks provides the leading packaging of Apache Lucene and Solr - build your own, we support - founded by the many prominent Lucene/Solr experts • LucidWorks Search - “Solr++” » UI, REST API, MapR connectors, relevance tools, much more • LucidWorks Big Data - Big Data as a Service - Integrated LucidWorks Search, Hadoop, machine learning with prebuilt workflows for many of these tasks
  • 22. Confidential and Proprietary © 2012 LucidWorks LucidWorks Big Data Inputs API MgmtSearch, Discovery, Analytics Processing & Storage Analytics Service Document Service Big Data LucidWorks Web HDFS Admin Service Mgmt Data Mgmt Provisioning, Monitoring & Configuration
  • 23. Confidential and Proprietary © 2012 LucidWorks Easy Technical Wins • Analyze logs from application stored in Hadoop/MapR • Seamlessly store search indexes in Hadoop/MapR - and feed to Pig, Mahout and others - use mirrors + NFS to directly deploy indexes • LucidWorks 2.5 easily connects with Hadoop/MapR - Click ranking, other log analysis built in - Classification as service - Offline Enrichment
  • 24. Confidential and Proprietary © 2012 LucidWorks 1 + 1 = 3
  • 25. Confidential and Proprietary © 2012 LucidWorks Learn More • Talk to Grant @gsingers grant@lucidworks.com • LucidWorks - http://www.lucidworks.com - Hash Tags » #lucene #solr #lucidworks • Additional credits to: - Ted Dunning (MapR, @ted_dunning) for participation in prior talks!

Editor's Notes

  1. A search system like Solr/Lucene, others only gets you so far. You will spend most of your time on the quality of results at scale. A way to shorten that time is through reflected intelligence
  2. TED: I think that the agenda needs to go here because it otherwise breaks up some key flow
  3. Search Abuse Can discuss how I started just doing free text, but then a curious thing happened, started to see people using the engine for things like: key/value, denormalized DBs, browsing engines, plagiarism detection, teaching languages, record linkage and much, much moreSearch has added more DB features over the yearsTED: We need to introduce the idea of *REVOLUTION* somewhere in here.
  4. All that revolution is good, but what the heck does this have to do w/ Big Data?
  5. GSI: needs a bit more meat
  6. Service-Oriented ArchitectureStatelessFailover/Fault TolerantLightweight Coordination and MessagingSmart about UpdatesDocument store isDistributedScalableAnalysisBatchNear Real-Time