See conference video - http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011
Metadata is widely understood to be a critical element of search, discovery and classification. But with the preponderance of unstructured data addressed by search technology, consistent native metadata is often in short supply. Organizations often find that the quality and depth of contextual metadata -- what documents are about – can maker or break search relevancy, precision and recall.
Semaphore is an enterprise semantic platform that uniquely captures an organization‘s subjects and topics into a taxonomy or ontology (model), in a manner that adds context for enhanced navigation and findability. Semaphore augments traditional information management systems like Solr search by adding advanced content classification, metadata and navigation capabilities to deliver a more complete, higher quality enterprise information management experience. This talk will focus on the following:
Deep dive into the technical integration of Semaphore with Apache/ Solr (including the connection points between Semaphore and Solr)
Discuss the Semaphore modules (Ontology Manager, Classification Server, Semantic Enhancement Server and Search Application Server) and how they provide better findability
Share a demonstration of Solr in action
Present a client case study (Nordyske).
Exploring the Future Potential of AI-Enabled Smartphone Processors
More Powerful Solr Search with Semaphore - Jeremy Bentley
1. Smartlogic
TM
Apache Lucene Eurocon
Jeremy
Bentley,
CEO
2. 1st degree of order
Filing management
• 80% of enterprise information is
unstructured
• Doubling every 19 months and
accelerating [Gartner]
• Increasing burden of compliance
• Enterprise 2.0 additions
3. 2nd degree of order
Index management
• File plans and metadata schema
• Mono- hierarchical standardised
taxonomies
• Manually applied classification
• Low level of consistency and quality
5. 5
A 10 year Flatline Expectation Gap
• 2001,
IDC,
“Quan5fying
Enterprise
Search”
Searchers
are
successful
in
finding
what
they
seek
50%
of
the
9me
or
less
• 2011,
MindMetre/SmartLogic
More
than
half
(52%)
cannot
find
the
informa9on
they
need
using
their
Enterprise
search
system
6. The explosion of information
80Tb
?
20
5mes
Terabytes
of
data
increase
in
Informa5on
volume
4Tb
1993-‐2001
2001-‐2009
Source:
the
Na5onal
Archives
8. Different vocabulary and ambiguity
You
Say
I
Say
Moon
Buggy
Lunar
Roving
Vehicle
Manned
Lunar
Surface
Vehicle
Missing results
Swine
Flu
Swine
Influenza
Virus
H1N1
Touchscreen
Touch
screen
Mul5-‐touch
You
Say
What
do
you
mean?
Apple
A
fruit?
Fiona
-‐
A
singer
/
songwriter?
An
electronics
company?
Rights
Employment
rights?
Too many results
Equal
rights?
Right
of
way?
Ford
Ford
Motor
Forward
Industrials
(5cker=FORD)
A
shallow
river
crossing
9. Conventional Search - Ineffective, Frustrating, and Inadequate
Drawbacks
Apparent
1 Needle in the Haystack
2
2 Multiple search terms
1
3 Irrelevant results
4 Out of date results
5 Multiple media forms
6 Unrestricted geography
7 Inappropriate ads
Not So Apparent
7
8 Can’t filter, select subset
9 No related topics
4
10 Missing results
11 No context or guidance
12 Best resource not clear
5
3
ü Time consuming
6
ü Inefficient
ü Ineffective
11. Paradox of Effort
Metadata
is
to
search,
what
pistons
are
to
a
petrol
engine.
Web Enterprise
Metadata effort High Low
Result Quality Low High
requirement
12. How do I structure it?
Information
Subject
Crea5on
Date
Loca5on
Modified
Date
Project
Author
Func5on
Format
(PDF,DOC,XLS)
(IT,HR,Finance)
Protec5ve
Marker
Expiry
Publisher
Expert
Reten5on
Site
Process Structural
13. 3rd degree content universe
Enterprise
Content
Search
Management
Portal
Infrastructure
Document
Management
Social
collaboraFon
Records
Management
Publishing
Process
Systems
Management
&
Digital
Workflow
Asset
Management
eDiscovery
14. 4th degree of order
Enterprise
Content
Search
Management
Portal
Infrastructure
Document
Management
Social
collaboraFon
Content Records
Intelligence Management
Publishing
Process
Systems
Management
&
Digital
Workflow
Asset
Management
eDiscovery
15. 4th degree of order
Content Intelligence
Content
Intelligence
Plahorm
Solr
16. Semaphore
Business
Vocabulary
Expose
Apply
Classifica5on
User
Decision
Ac5on
Inform
Copyright
@
2011
Smartlogic
Semaphore
Limited
16
17. Semaphore
Business
Vocabulary
Seman6c
models
Expose
Apply
Metadata
Seman6c
So7ware
Classifica5on
User
Decision
Ac5on
Inform
Contextual
User
Experience
Copyright
@
2011
Smartlogic
Semaphore
Limited
17
19. Metadata
Today
With
Content
Intelligence
Manual
Automa5c
Process
Process
Mul5ple
approaches
Single
Unified
‘one
size
fits
all’
approach
for
various
domains/audiences
Long
5me
to
crak
Short
5me
to
build
&
build
,
manually
applied
&
deploy,
automa5cally
Low
Quality
tags
High
Quality
tags
High
cost
to
apply
Low
cost
to
apply
Copyright
@
2011
Smartlogic
Semaphore
Limited
19
20. Semantic Models
Organising Contextualising Harnessing
Parent topics Content-types available Automate
Covered by
– Automotive sector – Flashnotes compliance and
– Bob Smith
– Bond issuers – Research reports distribution tasks
– Trade ideas – ‘Watch list’ lookup
Analytics available – Distribution according to preset
– Current bond price rules
Preferred term (Agreed Label)
– Relative bond spreads – Automated mapping
Ford Motor Company Influenced by to create aggregator metadata
– Credit ratings on
Ford Motor Credit Company User Experience
– European and US economies – Conceptual relevance
Also known as Location of – Changes in consumer demand – Related topics
– Ford fundamental data – Links to analytics
– Ford Motor – Earnings estimates Search engine enhancement
– F (Bloomberg) – Historic sales Key competitors – Search results
– FoMoCo and profits – BMW – Email alerts
– blue oval – Daimler Chrysler
– General Motors Unstructured
Subsidiaries – Toyota content integration
– Ford Motor Credit Company – Volkswagen – Published reports
– Mazda Products – Related topics
– Focus – Links to analytics
– Ka – Search results
– MX5 – Email alerts
21. Contextual User Experience
9
Key Features
1 Taxonomy enables
discovery, related searches
1
2 Related topics and content
2
3
3 Facets enable filtering
results by:
4
4 - Source
5 - Numerous topics
6 - Date
5
7
7 Best Bets
8
8 Automated doc. Tagging
9 A-Z
ü More relevant results
ü Fewer “bad hits”
ü Powerful navigation
6
24. Semaphore Search Integration
Classifica5on
Search
Local
Term
Rules
Enhancement
Index
Ontology
Manager
Classifica5on
Server
Server
Web
Services
API
Text
Miner
XML
API
Ontology
Informa5on
Document
“Tags”
Extracted
Text
Sample
Interface
Code
User
Requests
Query
Index
Collector/Normalizer
Search
Applica5on
Framework
Portal
Search
Engine
Corpus
Semaphore
core
module
Semaphore
op5onal
module
25. 4th degree of order
Enterprise
Content
Search
Management
Portal
Infrastructure
Document
Management
Social
collaboraFon
Content Records
Intelligence Management
Publishing
Process
Systems
Management
&
Digital
Workflow
Asset
Management
eDiscovery
26. Content Intelligence
Informa5on
Manufacturing
Mone5sa5on
Knowledge
Metadata
Recovery
Data
Loss
Preven5on
Risk
&
Compliance
Content
Analy5cs