Using Neo4j and Machine Learning to Create a Decision Engine, CluedIn
1. Taking the "Magic" Out of Machine Learning
Building a decision engine with Neo4j and Machine Learning Techniques
2. Tim Ward
Engineer at CluedIn
@jerrong / tiw@cluedin.com
Using Neo4j for 5+ years
Started on 1.6
WHO
AM I ?
3. WHAT DO
WE DO?
We help our customers achieve the connected enterprise
Average company uses 30+ SAAS tools
We connect them and the data in them automatically (SAAS)
8. DISCOVERED
The thing you learn through all of this is that machine
learning techniques are good at solving certain problems,
not the magic bullet for all problems.
9. THE SIMPLE
IDEA
Have a weighted decision engine that can persist
Have the ability to fork graph decisions async
Does not need to be super fast or realtime
Get something from nothing
10. WHY
To disseminate noise from valuable data
To reverse engineer how two things are related
To connect the enterprise.....automatically
11. THE SIMPLE
APPROACHES
CAN GET YOU
VERY FAR!
We combine the best parts of the graph with the backing of
a neural network to learn from its decisions.
Pattern matching combined with statistical models.
14. PRE-PROCESSING
PIPELINE
We combine the best parts of the graph with the backing of a
neural network to learn from its decisions.
Pattern matching combined with statistical models.
15. Martin Hyldahl, CTO
“The graph is the new secret in machine learning as most
models are dots on a chart or rows in a model. Besides
clustering algorithms there are not a lot of algorithms
where the dots are related in a strong and meaningful way.
Although this typically requires a lot more processing, we
found that this tapers off over time. The pre-processing
that we do to get data into a connected graph before we
make the decision tree allows our engine to be statistically
correct more than any known approach today.”
27. WHY FOR THE
BUSINESS?
Talk to Amalie (ale@cluedin.com)
Right to be forgotten
Data Privacy Act
cluedin.com/sales
Editor's Notes
Important that you know a little bit about what we do, to understand why we built the solution we are about to show you.
Why? Because when you have something, you know the results will be great.
TensorFlow started using a graph backend recently.
Optimization Process.
Output feed into a Data Model for a Neural Net
All of these steps add preliminary weight e.g. 2%, 0.8% etc.
Although somethings are less volatile e.g. a company doesn't change their Industry often, people don't change their email often, but they do change their relationships with those objects e.g. moving job, changing tasks, changing positions.
Where does the machine learning part come in. It comes in two ways.
Manual Observations placed on the graph using human interaction i.e. decisions that hover between 70-90% are surfaced to the user to self correct, which once again only adds a weight, doesn't make it right. E.g. Clustering Algorithms like K-Means allows you to add observations i.e. if you toss up two coins, I think it will be heads twice. They observations we make are simply, what features of these two things made them make a decision that it is the same thing. An observation of "Something Else" is also an observation (indicates to our decision tree to try new things) e.g. "I know that this is a TV show that John likes, but it isn't a photo of him"
Solution is to do as much in memory, requires you to build a graph model which takes a little bit of time. Good thing is that we don't need this to be realtime.
e.g. imagine someone sitting next to you saying "The Sky is green, no wait its red, wait its black, no got it now, it's definitely blue". You would want someone to just say, "The Sky is Blue, it is a well qualified fact, I am 100% sure of it, but I took 2 seconds more to tell you".
So all the graph stuff in the demo is done in memory and then persisted to the graph I.e have you ever left state in your brain until 2 hours later, you get an extra clue and it loads it back into the main memory and answers it.
Imagine plugging this engine into your enterprise, this is what we do e.g. Your CRM.
There is no doubt that we miss so much at work. You have experienced the wow moment of the default Neo4j Movie Sample when you run your first ShortestPath query - Now you get to apply this at work. All of this is to make sure the right graph is built up. With the Panama papers example, if you had a chink in the chain, it would have led to some weird results.
Reverse Engineer machine i.e. problem, solution, how did I get there?