The document discusses how graph databases and GPU acceleration can help analyze large datasets with billions of edges and nodes. It provides examples of precision medicine and cyber defense applications that involve very large graph problems. The document also summarizes Blazegraph's GPU acceleration technology, which can provide 200-300x faster performance for graph analytics compared to Apache Spark. Blazegraph uses its DASL language to simplify programming graph algorithms for multi-GPU execution.
4. http://blazegraph.com/
Precision Medicine and Hospitals
• 5,686 Hospitals in the United States
• Top 10% have 100B+ Edge Graph Problems
• 256 K-80 GPUs for a 100B Edge Graph Integration Application
• Syapse Precision Medicine Video
4
3/10/15
5-10 B Edge Graph Problems
5. http://blazegraph.com/
Uncovering influence links in molecular knowledge networks to streamline personalized medicine | Shin, Dmitriy et al.Journal of Biomedical
Informatics , Volume 52 , 394 - 405"
Finding the Next Cure for Cancer is a
Billion+ Edge Graph Challenge
22. http://blazegraph.com/
Case Study - NetFlow Data Analysis
• What is NetFlow data?
• NetFlow is a network protocol developed by Cisco for
collecting IP traffic information and monitoring network
traffic. http://www.solarwinds.com/what-is-netflow.aspx
• Sample NetFlow record
{"start_time":"2007-08-01 14:31:02.946”,”duration":0.000,
“src_addr":"122.166.71.110", “dst_addr":"9.155.118.136",
"src_port":10822,"dst_port":13567,"protocol":"UDP
“,"tcp_flags":"......","input_packets":1,"input_bytes":46}
• How is NetFlow collected?
• Generated by routers and forwarded to collection point
• Processed out of pcap records using tools such as nfcapd/
nfdump
22
23. http://blazegraph.com/
Graph Algorithms Applicable to NetFlow
• Path algorithms (BFS, SSSP, APSP,
CC) and their extensions (closeness,
betweenness, etc)
• Ranking algorithms (cardinality,
PageRank, BadRank, SecureRank)
• Clustering algorithms (canopy, k-
means, Jaccard, community detection,
etc)
• Many others!
23
24. http://blazegraph.com/
Challenges of Network Flow Analysis
24
• The volume of NetFlow packets could not previously be efficiently analyzed or
visualized to identify threats in real time
– One hour of collection on a modest network can easily generate more than 100M NetFlow
records
– Visualizing this data to identify points of interest is nearly impossible due to the “hairball
effect”, even after time windowing the data
– These issues largely led to attempts at batch processing the data on large clusters of
machines using Hadoop and/or Spark
– Pushing the problem into the batch space puts organizations in the unenviable position of
hoping to identify why and by whom their network was attacked last month rather than being
able to identify an attack as it is occurring.
30. http://blazegraph.com/
Users
Original
data
sources
FrontendBackend
Knowledge Graph Creation
Museum visitor
Museums and other sources
• Data crawling
• Data transformation (CIDOC-CRM)
• Data Interlinking
• Data enrichment / Information
extraction
Cards
Social networks
DBpedia BriTsh Museum Data User Data
Mobile App
• HTML5 Templates + CSS for mobile
devices
• Google Glass App
• QR Code recognition
• Pattern / image recognition
• Context reasoning
Knowledge Graph Exploration
• Templates for visualization (CIDOC-CRM
and external data)
• Timeline, Maps
• PivotViewer for visual exploration
• Semantic Search
• Graph Analytics (GAS)
Website visitorData Expert
Research Space / British Museum Project Architecture
and Use Cases
31. http://blazegraph.com/
System will either have
their own terminology to
qualify place
Or
More commonly it will be
implicit and not explicit.
Relationships both
support and bypass this
heterogeneity at the same
time
33. http://blazegraph.com/
FROM means
• Thing has been
located at a place
or
• Thing was created at a
place
or
• Thing was created by
a person from a place
or
• Thing was part of an
event that happened at
a place
or
• Thing was acquired at
a place
Research Space / British Museum Video
35. http://blazegraph.com/
This subset of FROM is just “refers to” India – These
objects may not be in India, or from India, but make
some reference to India. India is a subject.