SlideShare a Scribd company logo
1 of 45
Sparksee Graph Database!
Technology overview!
April 2014
º
*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º
*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
Sparksee Graph DatabaseIndex
Graph Databases!
Introduction to Sparksee!
Sparksee Internals!
Performance analysis!
High scalability!
HPC-SGAB Benchmark !
º
*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
Sparksee Graph DatabaseIndex
Graph Databases!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseGraph Databases
Graphs are everywhere!
!
— Increasing number of huge networks such as the Web,
Social Networks, Biological Systems, GPS…!
!
— Very large graphs!
!
— Interest for analyzing the !
interrelation between the entities !
in theses networks!
!
Classical graph representation!
!
— Adjacency matrix!
! Very large NxN sparse matrix, no labels, no multigraph,
! no attributes!
— Adjacency list!
! No labels, no attributes, still sparse consuming!
!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseGraph Databases
Classical graph storage!
— Relational database!
! Prefixed schema or very large table for nodes and edges, not !
! suitable for path traversals and graph exploration!
— XML!
! XML data is stored in the form of trees!
! Much work done on finding exact or approximate patterns !
! (subtrees)!
! Not thought for complex graph queries!
— RDF!
! Widely adopted standard for manipulating graph-like data!
! Large support from large vendors!
! SPARQL has become a de facto standard
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseGraph Databases
New approaches to graph analysis!
!
— Complex analysis computations on very large distributed
graphs !
! Map-reduce (Pegasus)!
! Vertex-centric computation model (Pregel)
!
— Graph Databases: database functionalities to store and
query graph-like data !
! Graph storage in a file system of a computer node with buffer !
! pool (Neo4j, Hypergraph, OrientDB, Infinitegraph!
! Multiple servers accessible through a load balancer (Neo4j HA)
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseGraph Databases
Requirements for graph databases!
!
— Data and schema represented as a graph!
— Data operations based on graph operations!
— Graph-based integrity restrictions!
— Multigraphs!
— Attributes attached to both vertices and edges!
— Graph queries combining edge traversals with attribute !
accesses!
— Diversity of workloads!
— Efficient secondary memory management!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseGraph Databases
º
*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
Sparksee Graph DatabaseIndex
Introduction to Sparksee!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseIntroduction to Sparksee
Sparksee!
!
IS a high-performance and out-of-core !
graph database management system
!
FOR large scale labeled and attributed multigraphs!
!
BASED ON vertical partitioning and collections of objects
identifiers stored as bitmaps
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseIntroduction to Sparksee
!
Sparksee — Characteristics!
!
— Graph split into small structures
! Move to main memory just significant parts (caching)
— Object identifiers (oids) instead of complex objects
! Reduce memory requirements
— Specific structures to improve traversals
! Index the edges and the neighbors of each node
— Attribute indices
! Improve queries based on value filters
— Implemented in C++
! Different APIs (Java, .NET, etc.) through wrappers
!
!
Sparksee — Capabilities!
!
Efficiency
! very compact representation using bitmaps. Highly compressible data !
! structures.
Capacity
! more than 100 billion vertices and edges in a single multicore computer.
Performance
! subsecond response in recommendation queries.
Scalability
! high throughput for concurrent queries.
Consistency
! partial transactional support with recovery.
Multiplatform
! Linux, Windows, MacOSX, Mobile
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseIntroduction to Sparksee
!
Logical graph model!
!
Labeled
! a label (type) for each vertex and edge !
Directed
! edges can have a fixed direction, from tail to head !
Attributed
! variable list of attributes for each!
! vertex and edge !
Multigraph
! multiple edges between two !
! vertices !
!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseIntroduction to Sparksee
!
!
!
Sparksee — Architecture!
!
!
!
!
!
!
!
!
!
!
!
!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseIntroduction to Sparksee
GDB
GRAPH
DATA STRUCTURES
PLATFORM
DEXCORE
SparkseeCpp – Graph Algorithms
SWIG
SparkseeJava SparkseeNet
.NET
App
JAVA
App
C++
App
BUFFERPOOL
Python
App
Mobile
App
SparkseePhyton
º
*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
Sparksee Graph DatabaseIndex
Sparksee internals
!
!
!
Graph representation!
!
We define a graph G = (V,E,L,T,H,A1,…,Ap) as: !
LABELS L = {(o, l ) | o ∈ (V ∪ E ) ∧ l ∈ string}
TAILS T = {(e, t ) | e ∈ E ∧ t ∈ V }
HEADS H = {(e, h) | e ∈ E ∧ h ∈ V }
ATTRIBUTES Ai = {(o, c ) | o ∈ (V ∪ E ) ∧ c ∈ {int, string, ...}}
!
With this representation:
— the graph is split into multiple lists of pairs!
— the first element of each pair is always a vertex or an edge!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
Graph representation!
!
!
!
!
!
!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
L v1, ARTICLE),
(v2,
ARTICLE),T (e1, v1), (e2,
v2), (e3, v4),
(e , v ), (e ,H (e1, v3), (e2,
v3), (e3, v3),
(e , v ), (e ,Aid (v1, 1), (v2, 2),
(v3, 3), (v4, 4),
(v , 1), (v , 2)Atitle (v1, Europa),
(v2, Europe),
(v , Europe),Anlc (v1, ca), (v2,
fr), (v3, en),
(v , en), (e ,Afilename (v5,
europe.png),
(v , bcn.jpg)Atag (e4, continent)
!
!
Value sets!
!
Groups all pairs of the !
original set with the !
same value as a pair !
between the value and !
the set of objects with !
such value. !
!
!
!
!
!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
L v1, ARTICLE), (v2, ARTICLE),
(v3, ARTICLE),
(v4, ARTICLE), (v5, IMAGE),
(v6, IMAGE), (e1, BABEL), (e2,
BABEL), (e3, REF), (e4, REF),
(e5, CONTAINS),
(e6, CONTAINS), (e7,
CONTAINS)
(ARTICLE, {v1, v2, v3, v4}),
(BABEL, {e1, e2}),
(CONTAINS, {e5, e6, e7}),
(IMAGE, {v5, v6}), (REF, {e3,
e4})
T (e1, v1), (e2, v2), (e3, v4), (e4,
v4), (e5, v3), (e6, v3), (e7, v4)
(v1, {e1}), (v2, {e2}), (v3, {e5,
e6}), (v4, {e3, e4, e7})
H (e1, v3), (e2, v3), (e3, v3), (e4,
v3), (e5, v5), (e6, v6), (e7, v6)
(v3, {e1, e2, e3, e4}), (v5, {e5}),
(v6, {e6, e7})
Aid (v1, 1), (v2, 2), (v3, 3), (v4, 4),
(v5, 1), (v6, 2)
(1, {v1, v5}), (2, {v2, v6}), (3,
{v3}), (4, {v4})
Atitle (v1, Europa), (v2, Europe), (v3,
Europe), (v4, Barcelona)
(Barcelona, {v4}), (Europa,
{v1}), (Europe, {v2, v3})
Anlc (v1, ca), (v2, fr), (v3, en), (v4,
en), (e1, en),(e2, en)
(ca, {v1}), (en, {v3, v4, e1, e2}),
(fr, {v2})
Afilena
me
(v5, europe.png), (v6, bcn.jpg) (bcn.jpg, {v6}), (europe.png,
{v5})
Atag (e4, continent) (continent, {e4})
!
Bitmap representation!
!
— Each vertex and edge is identified by a unique and
immutable !
oid (object identifier)
!
— Each vertex or edge set is stored in a bitmap structure:
! Each position in a bitmap corresponds to the oid of an object!
! Reduced amount of space (compression techniques)
! Very efficient binary logic operations
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
!
Value set representation!
!
— A value set is represented as two maps!
! One maps each different value to a vertex or edge set!
! The other maps each vertex or edge to a value oid
!
!
!
!
!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
!
Example of a bitmap based representation!
!
!
!
!
!
!
!
!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
!
Integrity rules!
!
!
!
!
!
!
!
!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
!
!
!
Value set operations!
!
domain returns the set of distinct values
objects returns the set of vertices or edges associated to a
value!
lookup returns the set of values !
associated to a set of objects!
insert adds a vertex or edge to the !
collection of objects of a value!
remove removes a vertex or edge !
from the collection of objects of a value
Graph query examples
— Number of articles!
! |objects (LABELS, ‘ARTICLE’)|
— Out-degree of English article ‘Europe’!
! |objects (TAILS, objects( TITLE, ‘Europe’) ∩ objects (NLC, ‘en’) ∩ objects
! (LABELS, ‘ARTICLE’))|
— Articles with references to the image with filename ‘bcn.jpg’
! ! {lookup(TAILS, x ) |x ∈ objects (HEAD, objects (FILENAME, ′ bcn.jpg′ ) !
! ! ∩ objects (LABELS, ′ IMAGE′ ))} !
— Count the articles of each language
{(x , y ) | x ∈ domain(NLC) ∧ y = |(objects (NLC, x ) ∩ objects (LABELS, !
! ! ′ ARTICLE′ ))|}
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
!
Implementation details
— Bitmaps are compressed by grouping the bits into clusters
of 32 consecutive bits (up to 137 billion objects per graph)!
— Locality is improved by generating consecutive oids for
each distinct vertex or edge labels!
— Sorted tree structure of bitmap clusters to speedup the
insert, remove, and binary logic operations!
— Maps are implemented using B+ trees
— The tail, head and attribute value sets have been split into
specific value sets for each label
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
º
*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
Sparksee Graph DatabaseIndex
Performance analysis
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabasePerformance analysis
!
Queries!
!
!
!
!
!
!
!
!
!
Q1: Find the article with the largest outdegree and traverse its shortest path tree
Q2: Recommend articles related to the most popular one
Q3: Find new images for articles from translations in other languages
Q4: Find, for each different language, the number of articles and images referenced
Q5: For each article with images, materialize the count of images
Q6: Remove all articles without images
Q1 Q2 Q3 Q4 Q5 Q6
k-hops and path traversals + +
graph pattern matching +
aggregations and edge connectivity +
graph transformation + +
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabasePerformance analysis
!
Performance Out-of-core!
!
Wikipedia Benchmark out-of-core, 1GB buffer pool.
!
!
!
!
!
!
!
!
(⋆) Java VM with 45 GB
MonetDb MySQL Neo4J* SPARKSEE
Graph Size (GB) 12.00 15.72 42.00 16.98
Load (h) Error 1.36 8.99 2.89
Q1 (s) 4,801.6 > 12 h. > 12 h. 120.5
Q2 (s) 3,788.4 13,841.6 > 12 h. 205.4
Q3 (s) 458.9 33.0 481.0 10.8
Q4 (s) 279,3 45.0 > 12 h. 144.9
Q5 (s) 267.4 930.3 > 12 h. 140.9
Q6 (s) Error 10707.0 > 12 h. 25791.6
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabasePerformance analysis
!
Query statistics!
!
!
!
!
!
!
!
!
!
!
Query results edge trav. edge trav./sec mem MB bitmaps
Q1 624,525 236,387,207 1,987,616.30 832.19 42.97%
Q2 5 261,735,954 1,270,747.94 2,974.50 48.59%
Q3 51,780 1,536,698 143,885.58 320.81 48.00%
Q4 254 4,987,879 33,984.32 245.13 77.67%
Q5 2,401,597 5,934,724 42,072.39 319.00 80.64%
Q6 52,380,949 281,433,106 37,434.27 11,583.88 67.76%
!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabasePerformance analysis
!
Bitmap memory usage!
!
!
!
!
!
!
!
!
!
!
Size (MB) Q1 Q2 Q3 Q4 Q5 Q6
LABELS 13.56 11.60 11.60 11.60 11.60 11.60 1.51
TAILS 1,272.32 1,030.90 857.09 229.67 164.79 164.79 90.18
HEADS 633.98 506.98 47.09
Attr. ID 122.77 0.85
Attr. TITLE 835.92 10.87
Attr. NLC 3,618.49 791.29 833.64 617.15
Attr. FILENAME 769.79
Attr. TAG 31.94 2.29
TOTAL 7,298.77 1,042.50 1,375.67 1,032.56 1,010.03 176.39 769.94
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabasePerformance analysis
!
Analysis of bitmap usage!
!
!
!
!
!
!
!
!
!
!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabasePerformance analysis
!
Bitmap size distribution!
!
!
!
!
!
!
!
!
!
!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabasePerformance analysis
Out of core stress test!
!
!
!
!
!
!
!
!
!
!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee technology
!
RMAT/Query 1 Scalability Test!
!
!
!
!
!
!
!
!
!
!
228 is out-of-core (2 billion edges)
º
*Sparsity Technologies — Powering Extreme
Data
sparsity–technologies.com
º
Sparksee Graph
Database
Sparksee technology
SNA Benchmark — Q1, Q6, Q9 and Q12
!
!
!
!
!
!
!
!
!
º
*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
Sparksee Graph DatabaseIndex
High scalability
!
!
!
High Scalability test — Mirror Servers in Amazon Elastic with a
Load Balancer
!
!
!
!
!
!
!
!
!
!
!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee technology
º
*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
Sparksee Graph DatabaseIndex
HPC-SGAB Benchmark
!
Definition
— HPC-SGAB: Badet et al. 2009!
! Measured in TEPS: traversed edges per second!
— Graph!
! Synthetic (R-Mat)!
! Power law distribution!
! Average: 8 edges/node
— Operations!
! ! Kernel 1: load graph and create indexes!
! ! Kernel 2: find the edge(s) with maximum weight!
! ! Kernel 3: k-hops!
! ! Kernel 4: betweenness centrality (Brandes algorithm)
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
!
!
Experimental setup
— Systems Tested!
! Sparksee (former DEX)!
! Neo4j!
! HypergraphDB!
! Jena (RDF)
— Platform!
! Single computer with 2 quad core Xeon E5410!
! 11GB RAM!
! LFF 2.25 TB disk!
! Single threaded
— Default benchmark configuration
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
Summary of results
!
!
!
!
!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
Kernel 1 - Load time
!
!
!
!
!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
Kernel 4 - Betweenness centrality
!
!
!
!
!
º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee internals
!
!
Bibliography!
!
R. Angles, A. Prat, D. Dominguez, J.L. Larriba, Benchmarking database systems
for social network applications (GRADES 2013)
!
N. Martínez, V. Muntés, S. Gómez, M.A. Águila, D. Dominguez, J.L. Larriba,
Efficient Graph Management Based On Bitmap Indices (IDEAS 2012)
!
N. Martínez, S. Gómez, F. Escalé, DEX: a High-Performance Graph Database
Management System (GDM 2011)
!
D. Dominguez, P. Urbón, A. Giménez, S. Gómez, N. Martínez, and J. L. Larriba,
Survey of Graph Database Performance on the HPC Scalable Graph Analysis
Benchmark (IWDG 2010)
!
N. Martínez, V. Muntés, S. Gómez, J. Nin, M. A. Sánchez, and J. Larriba, Dex:
High-performance Exploration on Large Graphs for Information Retrieval (CIKM
2007)
!
! º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
º Sparksee Graph DatabaseSparksee technology
º
*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
Sparksee Graph DatabaseIndex
Thanks!
Q&A!

More Related Content

What's hot

5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...Athens Big Data
 
Spark, Similarity, Approximations, NLP, Recommendations - Boulder Denver Spar...
Spark, Similarity, Approximations, NLP, Recommendations - Boulder Denver Spar...Spark, Similarity, Approximations, NLP, Recommendations - Boulder Denver Spar...
Spark, Similarity, Approximations, NLP, Recommendations - Boulder Denver Spar...Chris Fregly
 
Advanced Apache Spark Meetup Approximations and Probabilistic Data Structures...
Advanced Apache Spark Meetup Approximations and Probabilistic Data Structures...Advanced Apache Spark Meetup Approximations and Probabilistic Data Structures...
Advanced Apache Spark Meetup Approximations and Probabilistic Data Structures...Chris Fregly
 
Barcelona Spain Apache Spark Meetup Oct 20, 2015: Spark Streaming, Kafka, MLl...
Barcelona Spain Apache Spark Meetup Oct 20, 2015: Spark Streaming, Kafka, MLl...Barcelona Spain Apache Spark Meetup Oct 20, 2015: Spark Streaming, Kafka, MLl...
Barcelona Spain Apache Spark Meetup Oct 20, 2015: Spark Streaming, Kafka, MLl...Chris Fregly
 
Chicago Spark Meetup 03 01 2016 - Spark and Recommendations
Chicago Spark Meetup 03 01 2016 - Spark and RecommendationsChicago Spark Meetup 03 01 2016 - Spark and Recommendations
Chicago Spark Meetup 03 01 2016 - Spark and RecommendationsChris Fregly
 
Toronto Spark Meetup Dec 14 2015
Toronto Spark Meetup Dec 14 2015Toronto Spark Meetup Dec 14 2015
Toronto Spark Meetup Dec 14 2015Chris Fregly
 
Spark Summit East NYC Meetup 02-16-2016
Spark Summit East NYC Meetup 02-16-2016  Spark Summit East NYC Meetup 02-16-2016
Spark Summit East NYC Meetup 02-16-2016 Chris Fregly
 
Advanced Apache Spark Meetup Spark and Elasticsearch 02-15-2016
Advanced Apache Spark Meetup Spark and Elasticsearch 02-15-2016Advanced Apache Spark Meetup Spark and Elasticsearch 02-15-2016
Advanced Apache Spark Meetup Spark and Elasticsearch 02-15-2016Chris Fregly
 
Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...
Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...
Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...Chris Fregly
 
DC Spark Users Group March 15 2016 - Spark and Netflix Recommendations
DC Spark Users Group March 15 2016 - Spark and Netflix RecommendationsDC Spark Users Group March 15 2016 - Spark and Netflix Recommendations
DC Spark Users Group March 15 2016 - Spark and Netflix RecommendationsChris Fregly
 
SystemML - Declarative Machine Learning
SystemML - Declarative Machine LearningSystemML - Declarative Machine Learning
SystemML - Declarative Machine LearningLuciano Resende
 
Dallas DFW Data Science Meetup Jan 21 2016
Dallas DFW Data Science Meetup Jan 21 2016Dallas DFW Data Science Meetup Jan 21 2016
Dallas DFW Data Science Meetup Jan 21 2016Chris Fregly
 
Sydney Spark Meetup Dec 08, 2015
Sydney Spark Meetup Dec 08, 2015Sydney Spark Meetup Dec 08, 2015
Sydney Spark Meetup Dec 08, 2015Chris Fregly
 
Melbourne Spark Meetup Dec 09 2015
Melbourne Spark Meetup Dec 09 2015Melbourne Spark Meetup Dec 09 2015
Melbourne Spark Meetup Dec 09 2015Chris Fregly
 
HUG Italy meet-up with Tugdual Grall, MapR Technical Evangelist
HUG Italy meet-up with Tugdual Grall, MapR Technical EvangelistHUG Italy meet-up with Tugdual Grall, MapR Technical Evangelist
HUG Italy meet-up with Tugdual Grall, MapR Technical EvangelistSpagoWorld
 
Budapest Big Data Meetup Nov 26 2015
Budapest Big Data Meetup Nov 26 2015Budapest Big Data Meetup Nov 26 2015
Budapest Big Data Meetup Nov 26 2015Chris Fregly
 
Luciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conferenceLuciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conferenceLuciano Resende
 
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MoreStrata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MorePaco Nathan
 
Copenhagen Spark Meetup Nov 25, 2015
Copenhagen Spark Meetup Nov 25, 2015Copenhagen Spark Meetup Nov 25, 2015
Copenhagen Spark Meetup Nov 25, 2015Chris Fregly
 

What's hot (19)

5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
5th Athens Big Data Meetup - PipelineIO Workshop - Real-Time Training and Dep...
 
Spark, Similarity, Approximations, NLP, Recommendations - Boulder Denver Spar...
Spark, Similarity, Approximations, NLP, Recommendations - Boulder Denver Spar...Spark, Similarity, Approximations, NLP, Recommendations - Boulder Denver Spar...
Spark, Similarity, Approximations, NLP, Recommendations - Boulder Denver Spar...
 
Advanced Apache Spark Meetup Approximations and Probabilistic Data Structures...
Advanced Apache Spark Meetup Approximations and Probabilistic Data Structures...Advanced Apache Spark Meetup Approximations and Probabilistic Data Structures...
Advanced Apache Spark Meetup Approximations and Probabilistic Data Structures...
 
Barcelona Spain Apache Spark Meetup Oct 20, 2015: Spark Streaming, Kafka, MLl...
Barcelona Spain Apache Spark Meetup Oct 20, 2015: Spark Streaming, Kafka, MLl...Barcelona Spain Apache Spark Meetup Oct 20, 2015: Spark Streaming, Kafka, MLl...
Barcelona Spain Apache Spark Meetup Oct 20, 2015: Spark Streaming, Kafka, MLl...
 
Chicago Spark Meetup 03 01 2016 - Spark and Recommendations
Chicago Spark Meetup 03 01 2016 - Spark and RecommendationsChicago Spark Meetup 03 01 2016 - Spark and Recommendations
Chicago Spark Meetup 03 01 2016 - Spark and Recommendations
 
Toronto Spark Meetup Dec 14 2015
Toronto Spark Meetup Dec 14 2015Toronto Spark Meetup Dec 14 2015
Toronto Spark Meetup Dec 14 2015
 
Spark Summit East NYC Meetup 02-16-2016
Spark Summit East NYC Meetup 02-16-2016  Spark Summit East NYC Meetup 02-16-2016
Spark Summit East NYC Meetup 02-16-2016
 
Advanced Apache Spark Meetup Spark and Elasticsearch 02-15-2016
Advanced Apache Spark Meetup Spark and Elasticsearch 02-15-2016Advanced Apache Spark Meetup Spark and Elasticsearch 02-15-2016
Advanced Apache Spark Meetup Spark and Elasticsearch 02-15-2016
 
Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...
Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...
Paris Spark Meetup Oct 26, 2015 - Spark After Dark v1.5 - Best of Advanced Ap...
 
DC Spark Users Group March 15 2016 - Spark and Netflix Recommendations
DC Spark Users Group March 15 2016 - Spark and Netflix RecommendationsDC Spark Users Group March 15 2016 - Spark and Netflix Recommendations
DC Spark Users Group March 15 2016 - Spark and Netflix Recommendations
 
SystemML - Declarative Machine Learning
SystemML - Declarative Machine LearningSystemML - Declarative Machine Learning
SystemML - Declarative Machine Learning
 
Dallas DFW Data Science Meetup Jan 21 2016
Dallas DFW Data Science Meetup Jan 21 2016Dallas DFW Data Science Meetup Jan 21 2016
Dallas DFW Data Science Meetup Jan 21 2016
 
Sydney Spark Meetup Dec 08, 2015
Sydney Spark Meetup Dec 08, 2015Sydney Spark Meetup Dec 08, 2015
Sydney Spark Meetup Dec 08, 2015
 
Melbourne Spark Meetup Dec 09 2015
Melbourne Spark Meetup Dec 09 2015Melbourne Spark Meetup Dec 09 2015
Melbourne Spark Meetup Dec 09 2015
 
HUG Italy meet-up with Tugdual Grall, MapR Technical Evangelist
HUG Italy meet-up with Tugdual Grall, MapR Technical EvangelistHUG Italy meet-up with Tugdual Grall, MapR Technical Evangelist
HUG Italy meet-up with Tugdual Grall, MapR Technical Evangelist
 
Budapest Big Data Meetup Nov 26 2015
Budapest Big Data Meetup Nov 26 2015Budapest Big Data Meetup Nov 26 2015
Budapest Big Data Meetup Nov 26 2015
 
Luciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conferenceLuciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conference
 
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and MoreStrata 2015 Data Preview: Spark, Data Visualization, YARN, and More
Strata 2015 Data Preview: Spark, Data Visualization, YARN, and More
 
Copenhagen Spark Meetup Nov 25, 2015
Copenhagen Spark Meetup Nov 25, 2015Copenhagen Spark Meetup Nov 25, 2015
Copenhagen Spark Meetup Nov 25, 2015
 

Similar to Sparksee Technology overview

Polyglot Graph Databases using OCL as pivot
Polyglot Graph Databases using OCL as pivotPolyglot Graph Databases using OCL as pivot
Polyglot Graph Databases using OCL as pivotGraph-TA
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Michael Rys
 
How to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analyticsHow to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analyticsJulien Le Dem
 
In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015
In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015
In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015Iulia Emanuela Iancuta
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopAmanda Casari
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and OutTravis Oliphant
 
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...BigDataEverywhere
 
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Michael Rys
 
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...Helena Edelson
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Helena Edelson
 
Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)Wes McKinney
 
Riak at The NYC Cloud Computing Meetup Group
Riak at The NYC Cloud Computing Meetup GroupRiak at The NYC Cloud Computing Meetup Group
Riak at The NYC Cloud Computing Meetup Groupsiculars
 
Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...
Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...
Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...Paul Leclercq
 
Using PostgreSQL with Bibliographic Data
Using PostgreSQL with Bibliographic DataUsing PostgreSQL with Bibliographic Data
Using PostgreSQL with Bibliographic DataJimmy Angelakos
 
Intro to Apache Spark and Scala, Austin ACM SIGKDD, 7/9/2014
Intro to Apache Spark and Scala, Austin ACM SIGKDD, 7/9/2014Intro to Apache Spark and Scala, Austin ACM SIGKDD, 7/9/2014
Intro to Apache Spark and Scala, Austin ACM SIGKDD, 7/9/2014Roger Huang
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksDatabricks
 
Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Anubhav Jain
 
Vectorized R Execution in Apache Spark
Vectorized R Execution in Apache SparkVectorized R Execution in Apache Spark
Vectorized R Execution in Apache SparkDatabricks
 
Spark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesSpark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesDuyhai Doan
 

Similar to Sparksee Technology overview (20)

Polyglot Graph Databases using OCL as pivot
Polyglot Graph Databases using OCL as pivotPolyglot Graph Databases using OCL as pivot
Polyglot Graph Databases using OCL as pivot
 
Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)Big Data Processing with .NET and Spark (SQLBits 2020)
Big Data Processing with .NET and Spark (SQLBits 2020)
 
How to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analyticsHow to use Parquet as a basis for ETL and analytics
How to use Parquet as a basis for ETL and analytics
 
In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015
In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015
In Memory Data Pipeline And Warehouse At Scale - BerlinBuzzwords 2015
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code Workshop
 
Scaling PyData Up and Out
Scaling PyData Up and OutScaling PyData Up and Out
Scaling PyData Up and Out
 
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
Big Data Everywhere Chicago: Apache Spark Plus Many Other Frameworks -- How S...
 
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
Bringing the Power and Familiarity of .NET, C# and F# to Big Data Processing ...
 
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
Delivering Meaning In Near-Real Time At High Velocity In Massive Scale with A...
 
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
Streaming Big Data with Spark, Kafka, Cassandra, Akka & Scala (from webinar)
 
Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)Apache Arrow (Strata-Hadoop World San Jose 2016)
Apache Arrow (Strata-Hadoop World San Jose 2016)
 
Riak at The NYC Cloud Computing Meetup Group
Riak at The NYC Cloud Computing Meetup GroupRiak at The NYC Cloud Computing Meetup Group
Riak at The NYC Cloud Computing Meetup Group
 
Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...
Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...
Analyze one year of radio station songs aired with Spark SQL, Spotify, and Da...
 
Using PostgreSQL with Bibliographic Data
Using PostgreSQL with Bibliographic DataUsing PostgreSQL with Bibliographic Data
Using PostgreSQL with Bibliographic Data
 
Intro to Apache Spark and Scala, Austin ACM SIGKDD, 7/9/2014
Intro to Apache Spark and Scala, Austin ACM SIGKDD, 7/9/2014Intro to Apache Spark and Scala, Austin ACM SIGKDD, 7/9/2014
Intro to Apache Spark and Scala, Austin ACM SIGKDD, 7/9/2014
 
Scala 20140715
Scala 20140715Scala 20140715
Scala 20140715
 
Jump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and DatabricksJump Start into Apache® Spark™ and Databricks
Jump Start into Apache® Spark™ and Databricks
 
Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...Atomate: a high-level interface to generate, execute, and analyze computation...
Atomate: a high-level interface to generate, execute, and analyze computation...
 
Vectorized R Execution in Apache Spark
Vectorized R Execution in Apache SparkVectorized R Execution in Apache Spark
Vectorized R Execution in Apache Spark
 
Spark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-CasesSpark cassandra connector.API, Best Practices and Use-Cases
Spark cassandra connector.API, Best Practices and Use-Cases
 

Recently uploaded

Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksdeepakthakur548787
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data VisualizationKianJazayeri1
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataTecnoIncentive
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Boston Institute of Analytics
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxHaritikaChhatwal1
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBoston Institute of Analytics
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...KarteekMane1
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingsocarem879
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Milind Agarwal
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Cathrine Wilhelmsen
 

Recently uploaded (20)

Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Digital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing worksDigital Marketing Plan, how digital marketing works
Digital Marketing Plan, how digital marketing works
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
Principles and Practices of Data Visualization
Principles and Practices of Data VisualizationPrinciples and Practices of Data Visualization
Principles and Practices of Data Visualization
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Cyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded dataCyber awareness ppt on the recorded data
Cyber awareness ppt on the recorded data
 
Data Analysis Project: Stroke Prediction
Data Analysis Project: Stroke PredictionData Analysis Project: Stroke Prediction
Data Analysis Project: Stroke Prediction
 
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
Data Analysis Project Presentation: Unveiling Your Ideal Customer, Bank Custo...
 
SMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptxSMOTE and K-Fold Cross Validation-Presentation.pptx
SMOTE and K-Fold Cross Validation-Presentation.pptx
 
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis ProjectBank Loan Approval Analysis: A Comprehensive Data Analysis Project
Bank Loan Approval Analysis: A Comprehensive Data Analysis Project
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
wepik-insightful-infographics-a-data-visualization-overview-20240401133220kwr...
 
INTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processingINTRODUCTION TO Natural language processing
INTRODUCTION TO Natural language processing
 
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
Unveiling the Role of Social Media Suspect Investigators in Preventing Online...
 
Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)Data Factory in Microsoft Fabric (MsBIP #82)
Data Factory in Microsoft Fabric (MsBIP #82)
 

Sparksee Technology overview

  • 1. Sparksee Graph Database! Technology overview! April 2014 º *Sparsity Technologies — Powering Extreme Data sparsity–technologies.com
  • 2. º *Sparsity Technologies — Powering Extreme Data sparsity–technologies.com Sparksee Graph DatabaseIndex Graph Databases! Introduction to Sparksee! Sparksee Internals! Performance analysis! High scalability! HPC-SGAB Benchmark !
  • 3. º *Sparsity Technologies — Powering Extreme Data sparsity–technologies.com Sparksee Graph DatabaseIndex Graph Databases!
  • 4. º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseGraph Databases Graphs are everywhere! ! — Increasing number of huge networks such as the Web, Social Networks, Biological Systems, GPS…! ! — Very large graphs! ! — Interest for analyzing the ! interrelation between the entities ! in theses networks! !
  • 5. Classical graph representation! ! — Adjacency matrix! ! Very large NxN sparse matrix, no labels, no multigraph, ! no attributes! — Adjacency list! ! No labels, no attributes, still sparse consuming! ! º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseGraph Databases
  • 6. Classical graph storage! — Relational database! ! Prefixed schema or very large table for nodes and edges, not ! ! suitable for path traversals and graph exploration! — XML! ! XML data is stored in the form of trees! ! Much work done on finding exact or approximate patterns ! ! (subtrees)! ! Not thought for complex graph queries! — RDF! ! Widely adopted standard for manipulating graph-like data! ! Large support from large vendors! ! SPARQL has become a de facto standard º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseGraph Databases
  • 7. New approaches to graph analysis! ! — Complex analysis computations on very large distributed graphs ! ! Map-reduce (Pegasus)! ! Vertex-centric computation model (Pregel) ! — Graph Databases: database functionalities to store and query graph-like data ! ! Graph storage in a file system of a computer node with buffer ! ! pool (Neo4j, Hypergraph, OrientDB, Infinitegraph! ! Multiple servers accessible through a load balancer (Neo4j HA) º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseGraph Databases
  • 8. Requirements for graph databases! ! — Data and schema represented as a graph! — Data operations based on graph operations! — Graph-based integrity restrictions! — Multigraphs! — Attributes attached to both vertices and edges! — Graph queries combining edge traversals with attribute ! accesses! — Diversity of workloads! — Efficient secondary memory management! º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseGraph Databases
  • 9. º *Sparsity Technologies — Powering Extreme Data sparsity–technologies.com Sparksee Graph DatabaseIndex Introduction to Sparksee!
  • 10. º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseIntroduction to Sparksee Sparksee! ! IS a high-performance and out-of-core ! graph database management system ! FOR large scale labeled and attributed multigraphs! ! BASED ON vertical partitioning and collections of objects identifiers stored as bitmaps
  • 11. º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseIntroduction to Sparksee ! Sparksee — Characteristics! ! — Graph split into small structures ! Move to main memory just significant parts (caching) — Object identifiers (oids) instead of complex objects ! Reduce memory requirements — Specific structures to improve traversals ! Index the edges and the neighbors of each node — Attribute indices ! Improve queries based on value filters — Implemented in C++ ! Different APIs (Java, .NET, etc.) through wrappers
  • 12. ! ! Sparksee — Capabilities! ! Efficiency ! very compact representation using bitmaps. Highly compressible data ! ! structures. Capacity ! more than 100 billion vertices and edges in a single multicore computer. Performance ! subsecond response in recommendation queries. Scalability ! high throughput for concurrent queries. Consistency ! partial transactional support with recovery. Multiplatform ! Linux, Windows, MacOSX, Mobile º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseIntroduction to Sparksee
  • 13. ! Logical graph model! ! Labeled ! a label (type) for each vertex and edge ! Directed ! edges can have a fixed direction, from tail to head ! Attributed ! variable list of attributes for each! ! vertex and edge ! Multigraph ! multiple edges between two ! ! vertices ! ! º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseIntroduction to Sparksee
  • 14. ! ! ! Sparksee — Architecture! ! ! ! ! ! ! ! ! ! ! ! ! º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseIntroduction to Sparksee GDB GRAPH DATA STRUCTURES PLATFORM DEXCORE SparkseeCpp – Graph Algorithms SWIG SparkseeJava SparkseeNet .NET App JAVA App C++ App BUFFERPOOL Python App Mobile App SparkseePhyton
  • 15. º *Sparsity Technologies — Powering Extreme Data sparsity–technologies.com Sparksee Graph DatabaseIndex Sparksee internals
  • 16. ! ! ! Graph representation! ! We define a graph G = (V,E,L,T,H,A1,…,Ap) as: ! LABELS L = {(o, l ) | o ∈ (V ∪ E ) ∧ l ∈ string} TAILS T = {(e, t ) | e ∈ E ∧ t ∈ V } HEADS H = {(e, h) | e ∈ E ∧ h ∈ V } ATTRIBUTES Ai = {(o, c ) | o ∈ (V ∪ E ) ∧ c ∈ {int, string, ...}} ! With this representation: — the graph is split into multiple lists of pairs! — the first element of each pair is always a vertex or an edge! º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseSparksee internals
  • 17. Graph representation! ! ! ! ! ! ! º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseSparksee internals L v1, ARTICLE), (v2, ARTICLE),T (e1, v1), (e2, v2), (e3, v4), (e , v ), (e ,H (e1, v3), (e2, v3), (e3, v3), (e , v ), (e ,Aid (v1, 1), (v2, 2), (v3, 3), (v4, 4), (v , 1), (v , 2)Atitle (v1, Europa), (v2, Europe), (v , Europe),Anlc (v1, ca), (v2, fr), (v3, en), (v , en), (e ,Afilename (v5, europe.png), (v , bcn.jpg)Atag (e4, continent)
  • 18. ! ! Value sets! ! Groups all pairs of the ! original set with the ! same value as a pair ! between the value and ! the set of objects with ! such value. ! ! ! ! ! ! º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseSparksee internals L v1, ARTICLE), (v2, ARTICLE), (v3, ARTICLE), (v4, ARTICLE), (v5, IMAGE), (v6, IMAGE), (e1, BABEL), (e2, BABEL), (e3, REF), (e4, REF), (e5, CONTAINS), (e6, CONTAINS), (e7, CONTAINS) (ARTICLE, {v1, v2, v3, v4}), (BABEL, {e1, e2}), (CONTAINS, {e5, e6, e7}), (IMAGE, {v5, v6}), (REF, {e3, e4}) T (e1, v1), (e2, v2), (e3, v4), (e4, v4), (e5, v3), (e6, v3), (e7, v4) (v1, {e1}), (v2, {e2}), (v3, {e5, e6}), (v4, {e3, e4, e7}) H (e1, v3), (e2, v3), (e3, v3), (e4, v3), (e5, v5), (e6, v6), (e7, v6) (v3, {e1, e2, e3, e4}), (v5, {e5}), (v6, {e6, e7}) Aid (v1, 1), (v2, 2), (v3, 3), (v4, 4), (v5, 1), (v6, 2) (1, {v1, v5}), (2, {v2, v6}), (3, {v3}), (4, {v4}) Atitle (v1, Europa), (v2, Europe), (v3, Europe), (v4, Barcelona) (Barcelona, {v4}), (Europa, {v1}), (Europe, {v2, v3}) Anlc (v1, ca), (v2, fr), (v3, en), (v4, en), (e1, en),(e2, en) (ca, {v1}), (en, {v3, v4, e1, e2}), (fr, {v2}) Afilena me (v5, europe.png), (v6, bcn.jpg) (bcn.jpg, {v6}), (europe.png, {v5}) Atag (e4, continent) (continent, {e4})
  • 19. ! Bitmap representation! ! — Each vertex and edge is identified by a unique and immutable ! oid (object identifier) ! — Each vertex or edge set is stored in a bitmap structure: ! Each position in a bitmap corresponds to the oid of an object! ! Reduced amount of space (compression techniques) ! Very efficient binary logic operations º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseSparksee internals
  • 20. ! Value set representation! ! — A value set is represented as two maps! ! One maps each different value to a vertex or edge set! ! The other maps each vertex or edge to a value oid ! ! ! ! ! º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseSparksee internals
  • 21. º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseSparksee internals ! Example of a bitmap based representation! ! ! ! ! ! ! ! !
  • 22. º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseSparksee internals ! Integrity rules! ! ! ! ! ! ! ! !
  • 23. º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseSparksee internals ! ! ! Value set operations! ! domain returns the set of distinct values objects returns the set of vertices or edges associated to a value! lookup returns the set of values ! associated to a set of objects! insert adds a vertex or edge to the ! collection of objects of a value! remove removes a vertex or edge ! from the collection of objects of a value
  • 24. Graph query examples — Number of articles! ! |objects (LABELS, ‘ARTICLE’)| — Out-degree of English article ‘Europe’! ! |objects (TAILS, objects( TITLE, ‘Europe’) ∩ objects (NLC, ‘en’) ∩ objects ! (LABELS, ‘ARTICLE’))| — Articles with references to the image with filename ‘bcn.jpg’ ! ! {lookup(TAILS, x ) |x ∈ objects (HEAD, objects (FILENAME, ′ bcn.jpg′ ) ! ! ! ∩ objects (LABELS, ′ IMAGE′ ))} ! — Count the articles of each language {(x , y ) | x ∈ domain(NLC) ∧ y = |(objects (NLC, x ) ∩ objects (LABELS, ! ! ! ′ ARTICLE′ ))|} º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseSparksee internals
  • 25. ! Implementation details — Bitmaps are compressed by grouping the bits into clusters of 32 consecutive bits (up to 137 billion objects per graph)! — Locality is improved by generating consecutive oids for each distinct vertex or edge labels! — Sorted tree structure of bitmap clusters to speedup the insert, remove, and binary logic operations! — Maps are implemented using B+ trees — The tail, head and attribute value sets have been split into specific value sets for each label º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseSparksee internals
  • 26. º *Sparsity Technologies — Powering Extreme Data sparsity–technologies.com Sparksee Graph DatabaseIndex Performance analysis
  • 27. º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabasePerformance analysis ! Queries! ! ! ! ! ! ! ! ! ! Q1: Find the article with the largest outdegree and traverse its shortest path tree Q2: Recommend articles related to the most popular one Q3: Find new images for articles from translations in other languages Q4: Find, for each different language, the number of articles and images referenced Q5: For each article with images, materialize the count of images Q6: Remove all articles without images Q1 Q2 Q3 Q4 Q5 Q6 k-hops and path traversals + + graph pattern matching + aggregations and edge connectivity + graph transformation + +
  • 28. º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabasePerformance analysis ! Performance Out-of-core! ! Wikipedia Benchmark out-of-core, 1GB buffer pool. ! ! ! ! ! ! ! ! (⋆) Java VM with 45 GB MonetDb MySQL Neo4J* SPARKSEE Graph Size (GB) 12.00 15.72 42.00 16.98 Load (h) Error 1.36 8.99 2.89 Q1 (s) 4,801.6 > 12 h. > 12 h. 120.5 Q2 (s) 3,788.4 13,841.6 > 12 h. 205.4 Q3 (s) 458.9 33.0 481.0 10.8 Q4 (s) 279,3 45.0 > 12 h. 144.9 Q5 (s) 267.4 930.3 > 12 h. 140.9 Q6 (s) Error 10707.0 > 12 h. 25791.6
  • 29. º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabasePerformance analysis ! Query statistics! ! ! ! ! ! ! ! ! ! ! Query results edge trav. edge trav./sec mem MB bitmaps Q1 624,525 236,387,207 1,987,616.30 832.19 42.97% Q2 5 261,735,954 1,270,747.94 2,974.50 48.59% Q3 51,780 1,536,698 143,885.58 320.81 48.00% Q4 254 4,987,879 33,984.32 245.13 77.67% Q5 2,401,597 5,934,724 42,072.39 319.00 80.64% Q6 52,380,949 281,433,106 37,434.27 11,583.88 67.76% !
  • 30. º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabasePerformance analysis ! Bitmap memory usage! ! ! ! ! ! ! ! ! ! ! Size (MB) Q1 Q2 Q3 Q4 Q5 Q6 LABELS 13.56 11.60 11.60 11.60 11.60 11.60 1.51 TAILS 1,272.32 1,030.90 857.09 229.67 164.79 164.79 90.18 HEADS 633.98 506.98 47.09 Attr. ID 122.77 0.85 Attr. TITLE 835.92 10.87 Attr. NLC 3,618.49 791.29 833.64 617.15 Attr. FILENAME 769.79 Attr. TAG 31.94 2.29 TOTAL 7,298.77 1,042.50 1,375.67 1,032.56 1,010.03 176.39 769.94
  • 31. º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabasePerformance analysis ! Analysis of bitmap usage! ! ! ! ! ! ! ! ! ! !
  • 32. º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabasePerformance analysis ! Bitmap size distribution! ! ! ! ! ! ! ! ! ! !
  • 33. º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabasePerformance analysis Out of core stress test! ! ! ! ! ! ! ! ! ! !
  • 34. º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseSparksee technology ! RMAT/Query 1 Scalability Test! ! ! ! ! ! ! ! ! ! ! 228 is out-of-core (2 billion edges)
  • 35. º *Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph Database Sparksee technology SNA Benchmark — Q1, Q6, Q9 and Q12 ! ! ! ! ! ! ! ! !
  • 36. º *Sparsity Technologies — Powering Extreme Data sparsity–technologies.com Sparksee Graph DatabaseIndex High scalability
  • 37. ! ! ! High Scalability test — Mirror Servers in Amazon Elastic with a Load Balancer ! ! ! ! ! ! ! ! ! ! ! º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseSparksee technology
  • 38. º *Sparsity Technologies — Powering Extreme Data sparsity–technologies.com Sparksee Graph DatabaseIndex HPC-SGAB Benchmark
  • 39. ! Definition — HPC-SGAB: Badet et al. 2009! ! Measured in TEPS: traversed edges per second! — Graph! ! Synthetic (R-Mat)! ! Power law distribution! ! Average: 8 edges/node — Operations! ! ! Kernel 1: load graph and create indexes! ! ! Kernel 2: find the edge(s) with maximum weight! ! ! Kernel 3: k-hops! ! ! Kernel 4: betweenness centrality (Brandes algorithm) º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseSparksee internals
  • 40. ! ! Experimental setup — Systems Tested! ! Sparksee (former DEX)! ! Neo4j! ! HypergraphDB! ! Jena (RDF) — Platform! ! Single computer with 2 quad core Xeon E5410! ! 11GB RAM! ! LFF 2.25 TB disk! ! Single threaded — Default benchmark configuration º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseSparksee internals
  • 41. Summary of results ! ! ! ! ! º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseSparksee internals
  • 42. Kernel 1 - Load time ! ! ! ! ! º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseSparksee internals
  • 43. Kernel 4 - Betweenness centrality ! ! ! ! ! º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseSparksee internals
  • 44. ! ! Bibliography! ! R. Angles, A. Prat, D. Dominguez, J.L. Larriba, Benchmarking database systems for social network applications (GRADES 2013) ! N. Martínez, V. Muntés, S. Gómez, M.A. Águila, D. Dominguez, J.L. Larriba, Efficient Graph Management Based On Bitmap Indices (IDEAS 2012) ! N. Martínez, S. Gómez, F. Escalé, DEX: a High-Performance Graph Database Management System (GDM 2011) ! D. Dominguez, P. Urbón, A. Giménez, S. Gómez, N. Martínez, and J. L. Larriba, Survey of Graph Database Performance on the HPC Scalable Graph Analysis Benchmark (IWDG 2010) ! N. Martínez, V. Muntés, S. Gómez, J. Nin, M. A. Sánchez, and J. Larriba, Dex: High-performance Exploration on Large Graphs for Information Retrieval (CIKM 2007) ! ! º*Sparsity Technologies — Powering Extreme Data sparsity–technologies.com º Sparksee Graph DatabaseSparksee technology
  • 45. º *Sparsity Technologies — Powering Extreme Data sparsity–technologies.com Sparksee Graph DatabaseIndex Thanks! Q&A!