Slides from Trey's opening presentation for the South Big Data Hub's Text Data Analysis Panel on December 8th, 2016. Trey provided a quick introduction to Apache Solr, described how companies are using Solr to power relevant search in industry, and provided a glimpse on where the industry is heading with regard to implementing more intelligent and relevant semantic search.
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
South Big Data Hub: Text Data Analysis Panel
1. Text Data Analysis Panel: South Big Data Hub
Trey Grainger
SVP of Engineering, Lucidworks
2. Trey Grainger
SVP of Engineering
• Previously Director of Engineering @ CareerBuilder
• MBA, Management of Technology – Georgia Tech
• BA, Computer Science, Business, & Philosophy – Furman University
• Information Retrieval & Web Search - Stanford University
Other fun projects:
• Co-author of Solr in Action, plus numerous research papers
• Frequent conference speaker
• Founder of Celiaccess.com, the gluten-free search engine
• Lucene/Solr contributor
About Me
6. Lucidworks enables Search-Driven Everything
Data Acquisition
Indexing & Streaming
Smart Access API
Recommendations &
Alerts
Analytics & InsightsExtreme Relevancy
CUSTOMER
SERVICE
RESEARCH
PORTAL
DIGITAL
CONTENT
CUSTOMER
INSIGHTS
FRAUD
SURVEILLANCE
ONLINE
RETAIL
• Access all your data in a
number of ways from one
place.
• Secure storage and
processing from Solr and
Spark.
• Acquire data from any source
with pre-built connectors and
adapters.
Machine learning and
advanced analytics turn all
of your apps into intelligent
data-driven applications.
21. The Three C’s
Content:
Keywords and other features in your documents
Collaboration:
How other’s have chosen to interact with your system
Context:
Available information about your users and their intent
Reflected Intelligence
“Leveraging previous data and interactions to improve how
new data and interactions should be interpreted”
23. ● Recommendation Algorithms
● Building user profiles from past searches, clicks, and other actions
● Identifying correlations between keywords/phrases
● Building out automatically-generated ontologies from content and queries
● Determining relevancy judgements (precision, recall, nDCG, etc.) from click
logs
● Learning to Rank - using relevancy judgements and machine learning to train
a relevance model
● Discovering misspellings, synonyms, acronyms, and related keywords
● Disambiguation of keyword phrases with multiple meanings
● Learning what’s important in your content
Examples of Reflected Intelligence