Neo4j allows for faster development and performance compared to relational databases for a content collaboration platform. The graph database reduces complexity, improves query performance, and enables faster development cycles. Visualizing the graph data provides valuable analytics and insights into user behavior to further improve the product.
Apidays New York 2024 - The value of a flexible API Management solution for O...
GraphConnect Europe 2016 - Faster Lap Times with Neo4j - Srinivas Suravarapu
1. FASTER LAP TIMES
WITH NEO4J
Srinivas Suravarapu
Chief Architect – Scribestar Ltd
@srinivas_s
2. Scribestar
• A content collaboration platform for the legal community.
The solution is targeted at lawyers and how they draft
legal content.
• From an information systems point of a view it is a
collaboration platform and is concerned with getting
related users changing related content to work effectively.
• We are about 20 people and the platform is built on
on .NET and uses Neo4j as its primary store. (now)
3. Relational Stores
Looking back at some Content management systems.
• They tend to base themselves on a Relational DBMS and
serve content using BLOBs
As content grows, a monolithic store pretty quickly starts
affecting users ability to perform functions which do not
have anything to do with content itself. BLOBs are good for
a few pieces of content but when all you have is content,
you have to go back to the drawing board on where you
store it.
4. NOSQL Stores
Some have managed to use document oriented databases
Able to serve large content to the web quickly.
When you combine content in some form of ML format with
fragments of relational data inevitably present in every system,
they rely heavily on how you model your aggregates.
Working with multiple aggregates sitting behind service
boundaries tend to bring up consistency issues, and we tend to
offload a lot of implementation complexity to the application tier.
5. Polyglot Persistence
Using multiple storage technologies to store the information
is inevitable. The type of data and how its consumed by
parts of your application should be the driver to choose
your data store.
We store our relation information using a graph and store
content in a file store. The ability of the user to collaborate
effectively on the content is isolated and not affected by
users collaborating on the metadata.
7. Wish list
• Constantly being compared to the capabilities of desktop
publishing tools
• Should be fast and secure
• We cannot loose content
• Corruption of content is not an option
• Everyone should be able to change the same piece of
content at the time (You Only think you need it)
12. Modeling the domain
• Stay as close as possible to the domain and let the graph
be a reflection of the users actions in the system over time
• Bounded contexts still apply and the rules of how you
share information between two aggregates remain, think
how you can have multiple graphs that are smaller.
• Keeping the graph acyclic and directed keeps it simple
however this is entirely based on the context of your
problem.
13.
14. Code Tips
• Principles of how to interact with a database haven’t
changed
• The notion of using parameters, indexes or constraints exists.
• Don’t read the same information repeatedly if it doesn’t change
• Writes, the principles of concurrency haven't changed.
• Watch out for queries that are reused, its easier to write separate
queries for separate concerns, duplication is fine.
• The unexpected side effects of query reuse for different concerns
turns out to be a killer.
• Use the profile and explain options to analyse your queries
15. Cypher Tuning
• Neo4j
• Switch query logging on to capture slow running queries, threshold
is subject to what you want
• Switch metrics to be output to graphite or CSV files
• Have a suite of tests which run regularly and test concurrency and
load.
• Use the feedback and tweak any slow running queries.
• Repeat the exercise until you don’t find any queries being written
into the querylog, that should ensure you have fast queries
• Always look at getting to the node you are interested first like
SELECT on SELECT in SQL , MATCH on MATCH is effective.
16. The business benefits
• We built our new solution in 8 months, compared to the
former that was built for about 2+ years
• We did this with half the size of the original team
• The system is at least X times quicker than where it used
to be, where X is a two digit number J
• The complexity of work has reduced – Indicators
• Team hasn’t come up with I don’t know how big this is in a while
• With the definition of the cycle time widening, the cycle times have
dropped for the same complexity
17. The business benefits
• The complexity of work reduced
• With the definition of the cycle time widening, the cycle times to
drop the same level of complexity reduced significantly.
• Being able to visualize the data real time provides
valuable analytics for the user
• Cypher is absolutely powerful in its ability to get you to
where you need to on the graph
• Reduces the need to understand the implementation
immediately to some degree.
18. Moving into the future
• Visualizing the information using tools to get some insight into
user behavior, this will help us evolve the product.
• Some of the principles we have used should help us scale out
without contention – famous last words , remains to be seen.
• Who knows we may be able to store large files in hybrid
technology of Neo and something else, alternative stores like
Riak or any self hosted S3 styled products
• Would be great to have a light-weight Linkurious plugin on the
neo dashboard
• Precedents and taxonomy subject to research
19. The agility you obtain using a graph is great,
changing the underlying model is no where
near as painful or dreadful, the value of
visualization simply exceeds any cost
involved in the transition