Lucidworks Senior Systems Engineer Allan Syiek discusses simple querying vs. data mining and intelligent search, and how Lucidworks Fusion can help you turn raw data into insight.
Streamlining Python Development: A Guide to a Modern Project Setup
Webinar: Fusion for Business Intelligence
1.
2. Fusion
for
Business
Intelligence
Allan
Syiek
Senior
Sales
Engineer
September
14,
2016
3. Session
Objec,ves
By
the
end
of
this
session,
you
will:
– Have
a
high
level
awareness
of
the
variety
of
search
and
discovery
funcFonality
available
– Select
the
right
product
for
a
parFcular
use
case
– Know
why
this
baby
is
so
happy
4. Agenda
Ø The
Beer
and
Diaper
Legend
Ø DIKW
Pyramid
Ø What
is
Enterprise
Search
Ø Indexing
101
Ø StaFsFcs
vs.
Data
Mining
vs.
Machine
Learning
Ø What
is
Business
Intelligence
Ø Where
does
Fusion
Fit?
5. Parable
of
the
Beer
and
the
Diapers
Illustrates
the
difference
between
querying
and
data
mining,
already
firmly
enshrined
in
BI
mythology
7. What
is
Enterprise
Search
Q.
What
do
you
do
with
a
mountain
of
data
located
everywhere?
A.
Depends….
What
do
you
need
it
for?
8. • Crawling,
Parsing,
Indexing,
Searching
• Advanced
Searches
• Searching
Structured
Data
• Searching
Unstructured
Data
• Metadata
• Ranking
• Results
• Access
Control
• UI
• Tuning
• ReporFng
• Scale
and
Performance
Aspects
of
Enterprise
Search
9.
10. Index Pipeline
Tika
Parser
Exclusion
Filter
Field
Mapper
HTML
Transform
Stage
XML
Transform
Stage
OpenNLP
EnFty
ExtracFon
Gaze]eer
ExtracFon
Regular
Expression
AggregaFng
Javascript
(custom
scripts)
…and
others…
SearchCollection
SearchUI
Search
Fields/Parameters
Facets
Landing
Pages
Boost
Documents
Block
Documents
Security
Trimming
RecommendaFon
BoosFng
Rollup
Aggregator
Sub
Query
Javascript
(custom
scripts)
…and
others…
Documents
Query Pipeline
11.
Indexing
101
A
system
used
to
make
finding
informa,on
easier.
Every
word
is
converted
into
a
wordID
by
using
an
in-‐memory
hash
table
-‐-‐
the
lexicon.
Occurrences
in
the
current
document
are
translated
into
hit
lists
and
are
wri]en
into
the
forward
“barrels”.
Inverted
Barrels
have
been
sorted.
12. Indexing
101
-‐
Ranking
• Score
Results
for
PresentaFon
– Weighted
by
Term
Frequency-‐Inverse
Document
Frequency
(TF-‐IDF)
– Clustering
– Complex
proprietary
algorithms
14. Sta,s,cs
vs.
Data
Mining
vs.
Machine
Learning
– Sta,s,cs
quan%fies
numbers
– Data
Mining
explains
pa]erns
– Machine
Learning
predicts
with
models
– Ar,ficial
Intelligence
behaves
and
reasons
15. What
is
Business
Intelligence
• BI
technologies
provide
historical,
current
and
predicFve
views
of
business
operaFons
• Business
intelligence
is
made
up
of
an
increasing
number
of
components
including:
– MulFdimensional
aggregaFon
and
allocaFon
(OLAP–
Online
AnalyFcal
Processing)
– DenormalizaFon,
tagging
and
standardizaFon
(relaFonal
database)
– Real
Fme
reporFng
with
analyFcal
alert
– A
method
of
interfacing
with
unstructured
data
sources
(data
mining)
– Group
consolidaFon,
budgeFng
and
rolling
forecasts
– StaFsFcal
inference
and
probabilisFc
simulaFon
– Key
performance
indicators
opFmizaFon
– Version
control
and
process
management
– Open
item
management
16. • Why Fusion for Log
Analytics?
• Secure
access
to
dashboards
• ETL
of
logs
using
Index
pipelines
• Spark
run
analysis
models
for
logs
and
leverage
with
ML
index
pipeline
• Time
series
index
management
17. Massive-‐scale
log
analyFcs
• Index billions of log events per day, real-time
• Recent event and historical analysis: Analyze logs
over time: today, recent, past week, past 30 days,
…
• Easy to use dashboards to visualize common
questions and allow for ad hoc analysis
• Ability to scale linearly as business grows …
with sub-linear growth in costs!
• Easy to setup, easy to manage, easy to use
18. • Signals
&
RecommendaFons
Fusion
can
capture,
store,
and
aggregate
signals
from
a
variety
of
sources
to
drive
predicFve
search
capabiliFes
and
conFnuous
relevancy
tuning
Signals can include
Clicks
and
queries
Add-‐to-‐cart
and
purchase
behavior
Geo-‐locaFon
User
behavior
and
preferences
User
history
and
past
orders
Device
19. VisualizaFon
&
Insight
with
SILK
SILK Dashboards provide a rich visual
interface for users to search, inspect and
visualize event/log data
Gives user the power to perform ad-hoc
search and analysis on massive amounts
of multi-structured and time series data.
Real-time insights and trends for on-the-
fly decision making using the most
accurate and up-to-date data
Users can share visualizations and
dashboards