SlideShare a Scribd company logo
1 of 102
Download to read offline
WIFI SSID:SparkAISummit | Password: UnifiedAnalytics
Morpheus
https://db-engines.com/en/ranking_categories
Node
● Represents an entity within the graph
● Can have labels
Relationship
● Connects a start node with an end node
● Has one type
Property
● Describes a node/relationship: e.g. name, age, weight etc
● Key-value pair: String key; typed value (string, number, bool, list, ...)
Property graph view of data mirrors conceptual view
○ Entities and relationships, with attributes
○ Nodes and relationships, with properties
Graph queries are concise and visual (ASCII Art)
MATCH (c:Customer)-[:BOUGHT]-(p:Product)
RETURN c.id, p.id
Network algorithms run over graphs
→ Graphs enhance data engineering and science
Tables Graphs
Transactional
PostgreSQL,
Oracle,
SQLServer
Neo4j
Data
Integration
& Analytics Spark SQL Morpheus
Spark is an immutable data processing engine
○ Spark graphs are compositions of tables (DFs)
○ Spark graphs can be transformed and combined
○ Functions (including queries) over multiple graphs
○ Cypher query plans mapped to Catalyst
Neo4j is a native transactional CRUD database
○ Neo4j graphs use a native graph data representation
○ Neo4j has optimized in-process MT graph algos
○ Morpheus helps move data in and out of Neo4j
Graphs and tables are both useful data models
○ Finding paths and subgraphs, and transforming graphs
○ Viewing, aggregating and ordering values
The Morpheus project parallels Spark SQL
○ PropertyGraph type (composed of DataFrames)
○ Catalog of graph data sources, named graphs, views,
○ Cypher query language
A CypherSession adds graphs to a SparkSession
● Data integration
○ Integrate (non-)graphy data from multiple, heterogeneous
data sources into one or more property graphs
● Distributed Cypher execution
○ OLAP-style graph analytics
● Data science
○ Integration with other Spark libraries
○ Feature extraction using Neo4j Graph Algorithms
Pathfinding
& Search
Centrality /
Importance
Community
Detection
Link
Prediction
Finds optimal paths
or evaluates route
availability and quality
Determines the
importance of distinct
nodes in the network
Detects group
clustering or partition
options
Evaluates how
alike nodes are
Estimates the likelihood
of nodes forming a
future relationship
Similarity
PROPERTY
GRAPH
composing
DataFrames
Hive, DF, JDBC
TABLES
SUB-
GRAPH
FS snapshot
Morpheus
SOURCES
DataFrame
Table Result
Cypher
QUERY
Property
Graph Result
Property
Graph Cypher
QUERY
Cypher
QUERY
Property
Graph Result
DataFrame
Driving Table
GRAPH
ALGOS
ANALYSIS
toolsets
DataFrame DataFrame
Property
Graph
Property
Graph
Morpheus
STORE
SUBGRAPH
FS snapshot
Property
Graph
Cypher 9 is the latest full version of openCypher
○ Implemented in Neo4j 3.5
○ Includes date/time types and functions
○ Implemented in whole/part by six other vendors
○ Several other partial and research implementations
○ Cypher for Gremlin is another openCypher project
Cypher is a full CRUD language ← OLTP database
○ RETURNs only tabular results: not composable
○ Results can include graph elements (paths,
relationships, nodes) or property values
Morpheus implements most of read-only Cypher
○ No MERGE or DELETE
○ Spark immutable data + transformations
Cypher 10 proposes Multiple Graph features
○ Multiple Graph CIP: https://git.io/fjmrx
Allows for Cypher Query composition
○ Similar to chaining transformations on DataFrames
Support Graph Catalog for managing Graphs
○ Analogous to Spark SQL catalog
Query support for Graph Construction
Input: a property graph
Output: a table
FROM GRAPH socialNetwork
MATCH ({name: 'Dan'})-[:FRIEND*2]->(foaf)
RETURN toUpper(foaf.name) AS name
ORDER BY name DESC
Language features available in Morpheus
Input: a property graph
Output: a property graph
FROM GRAPH socialNetwork
MATCH (p:Person)-[:FRIEND*2]->(foaf)
WHERE NOT (p)-[:FRIEND]->(foaf)
CONSTRUCT
CREATE (p)-[:POSSIBLE_FRIEND]->(foaf)
RETURN GRAPH
Language features available in Morpheus
Input: property graphs
Output: a property graph
FROM GRAPH socialNetwork
MATCH (p:Person)
FROM GRAPH products
MATCH (c:Customer)
WHERE p.email = c.email
CONSTRUCT ON socialNetwork, products
CREATE (p)-[:IS]->(c)
RETURN GRAPH
Language features available in Morpheus
Input: property graphs
Output: a property graph
CATALOG CREATE VIEW youngFriends($inGraph){
FROM GRAPH $inGraph
MATCH (p1:Person)-[r]->(p2:Person)
WHERE p1.age < 25 AND p2.age < 25
CONSTRUCT
CREATE (p1)-[COPY OF r]->(p2)
RETURN GRAPH
}
Language features available in Morpheus
Input: property graphs
Output: table or graph
FROM youngFriends(socialNetwork)
MATCH (p:Person)-[r]->(o)
RETURN p, r, o
// and views over views
FROM youngFriends(europe(socialNetwork))
MATCH ...
Language features available in Morpheus
Morpheus
Query EngineProperty Graph Data Sources
Property Graph Catalog
Scala API
SQL JDBC
● Distributed executionSpark Core
Spark SQL
● Rule- and Cost-based query
optimization via Catalyst
MATCH (c:Captain)-[:COMMANDS]->(s:Ship)
WHERE c.name = ‘Morpheus’
RETURN c.name, s.name
openCypher
Frontend
● Parsing, Rewriting, Normalization
● Semantic Analysis (Scoping,
Typing, etc.)
Morpheus
● Data Import and Export
● Schema and Type handling
● Query translation to Spark
operations
Relational
Planning
Logical
Planning
Spark
Backend
● Translation into Logical
Operators
● Basic Logical Optimization
● Backend Agnostic Query
Representation
● Conversion and typing of
Frontend expressions
● Translation into Relational
Operations on abstract
tables
● Column layout computation
Intermediate
Language
● Spark-specific table
implementation
● In Morpheus, PropertyGraphs are represented by
○ Node Tables and Relationship Tables
● Tables are represented by DataFrames
○ Require a fixed schema
● Property Graphs have a Graph Type
○ Node and relationship types that occur in the graph
○ Node and relationship properties and their data type
Property Graph
Node Tables
Rel. Tables
Graph Type
:Captain:Person
name: Morpheus
:Ship
name: Nebuchadnezzar
:COMMANDS
id name
0 Morpheus
id name
1 Nebuchadnezzar
id source target
0 0 1
:Captain:Person
:Ship
:COMMANDS
Graph Type {
:Captain:Person (
name: STRING
),
:Ship (
name: STRING
),
:COMMANDS
}
Property Graph
⋈
⋈
π
MATCH (c:Captain)-[:COMMANDS]->(s:Ship)
WHERE c.name = ‘Morpheus’
RETURN c.name, s.name
π
π
Morpheus
Relational
Planning
...
Part 1
From JSON to Graph
Create persistent
Property Graph from
raw Yelp dataset
Read Yelp Data from
JSON into DataFrames
Create Property Graph
from DataFrames
Store Property Graph
using Parquet
Part 2
A library of Graphs
Create a library of
graph projections
Read Property Graph
from Parquet
Create subgraph for a
specifc city
Project and persist city
subgraph
Part 3
Federated queries
Integrate reviews with
social network data
Define Graph Type and
Mapping with Graph
DDL
Load data from Hive
and H2
Run analytical query on
the integrated graph
Part 5
Neo4j Integration II
Recommend
businesses to users
Load graph projections
from library
Write graphs to Neo4j,
run Louvain + Jaccard
Run analytical query in
Morpheus to find
recommendations
Part 4
Neo4j Integration I
Find trending
businesses
Load graph projections
from library
Write graphs to Neo4j
and run PageRank
Combine graphs in
Morpheus and select
trending businesses
https://git.io/fjZ2b
● Yelp is a search service based on crowd-sourced
reviews about local businesses
● The Yelp Open Dataset is part of the Yelp Dataset
Challenge
○ Yelps’ effort to encourage researchers to explore the
dataset
○ ~150K businesses, 10M users, 5M reviews, 35M
friendships
https://www.yelp.com
https://www.yelp.com/dataset
https://www.yelp.com/dataset/challenge
:Business
name : ACME
address : 123 ACME Rd.
city : San Jose
state : CA
:User
name : Alice
since : 2013
elite : [2014, 2016]
:User
name : Bob
since : 2014
elite : null
:REVIEWS
stars : 5
date : 2014-02-03
:REVIEWS
stars : 4
date : 2014-08-03
business.json
user.json
review.json
Create Node and
Relationship Tables
Create Property Graph Store Property Graph
https://git.io/fjZ2N
// (:User)
val userDataFrame = spark.read.json(...).select(...)
val userNodeTable = CAPSEntityTable.create(NodeMappingBuilder.on("id")
.withImpliedLabel("User")
.withPropertyKey("name")
.withPropertyKey("yelping_since")
.withPropertyKey("elite")
.build, userDataFrame)
id name yelping_since elite
0 Alice 2013 [2014, 2016]
1 Bob 2014 null
● Property Graphs are managed within a catalog
Cypher Session
Property Graph Catalog
Property Graph Data Source <namespace>
Property Graph <name>
QualifiedGraphName = <namespace>.<name>
● API to operate with the query engine and the catalog
trait CypherSession {
def cypher(
query: String,
parameters: CypherMap = CypherMap.empty,
drivingTable: Option[CypherRecords] = None
): Result
def catalog: PropertyGraphCatalog
}
● API to manage multiple Property Graphs
● Catalog functions can be executed via Cypher or Scala API
trait PropertyGraphCatalog {
def register(namespace: Namespace, dataSource: PropertyGraphDataSource): Unit
def store(qualifiedGraphName: QualifiedGraphName, graph: PropertyGraph): Unit
def graph(qualifiedGraphName: QualifiedGraphName): PropertyGraph
def drop(qualifiedGraphName: QualifiedGraphName): Unit
// additional methods for managing views, listing namespaces and graphs
}
● API for loading and saving property graphs
trait PropertyGraphDataSource {
def hasGraph(name: GraphName): Boolean
def graph(name: GraphName): PropertyGraph
def schema(name: GraphName): Option[Schema]
// additional methods for storing, deleting, listing graphs
}
PGDS Multiple graphs Read graphs Write graphs
File-based
Parquet, ORC, CSV
HDFS, local, S3
Yes Yes Yes
SQL
Hive, Jdbc
Yes Yes No
Neo4j Bolt Yes Yes Yes
Neo4j Bulk Import No No Yes
Cypher Session
Property Graph Catalog
Property Graph Data Source <namespace>
Property Graph <name>
QualifiedGraphName = <namespace>.<name>
Cypher Session
Property Graph Catalog
“social-net” (Neo4j PGDS)
“US” (Property Graph)
FROM social-net.US
MATCH (p:Person)
RETURN p
Cypher Session
Property Graph Catalog
“social-net” (Neo4j PGDS)
“US”
“EU”
“products” (SQL PGDS)
“2018”
“2017”
FROM social-net.US
MATCH (p:Person)
FROM products.2018
MATCH (c:Customer)
WHERE p.email = c.email
RETURN p, c
Cypher Session
Property Graph Catalog
“social-net” (Neo4j PGDS)
“US”
“EU”
“products” (SQL PGDS)
“2018”
“2017”
CATALOG CREATE GRAPH social-net.US_new {
FROM social-net.US
MATCH (p:Person)
FROM products.2018
MATCH (c:Customer)
WHERE p.email = c.email
CONSTRUCT ON social-net.US
CREATE (p)-[:SAME_AS]->(c)
RETURN GRAPH
}
CATALOG CREATE GRAPH social-net.US_new {
FROM social-net.US
MATCH (p:Person)
FROM products.2018
MATCH (c:Customer)
WHERE p.email = c.email
CONSTRUCT ON social-net.US
CREATE (p)-[:SAME_AS]->(c)
RETURN GRAPH
}
Cypher Session
Property Graph Catalog
“social-net” (Neo4j PGDS)
“US”
“EU”
“products” (SQL PGDS)
“2018”
“2017”
“US_new”
Cypher Session
Property Graph Catalog
“social-net” (Neo4j PGDS)
“US”
“EU”
...
CATALOG CREATE VIEW youngPeople($sn) {
FROM $sn
MATCH (p:Person)-[r]->(n)
WHERE p.age < 21
CONSTRUCT
CREATE (p)-[COPY OF r]->(n)
RETURN GRAPH
}
FROM youngPeople(social-net.US)
MATCH (p:Person)
RETURN p
“youngPeople”
Views
Part 1
From JSON to Graph
Create persistent
Property Graph from
raw Yelp dataset
Read Yelp Data from
JSON into DataFrames
Create Property Graph
from DataFrames
Store Property Graph
using Parquet
Part 2
A library of Graphs
Create a library of
graph projections
Read Property Graph
from Parquet
Create subgraph for a
specifc city
Project and persist city
subgraph
Part 3
Federated queries
Integrate reviews with
social network data
Define Graph Type and
Mapping with Graph
DDL
Load data from Hive
and H2
Run analytical query on
the integrated graph
Part 5
Neo4j Integration II
Recommend
businesses to users
Load graph projections
from library
Write graphs to Neo4j,
run Louvain + Jaccard
Run analytical query in
Morpheus to find
recommendations
Part 4
Neo4j Integration I
Find trending
businesses
Load graph projections
from library
Write graphs to Neo4j
and run PageRank
Combine graphs in
Morpheus and select
trending businesses
https://git.io/fjZ2b
:Business
name : ACME
address : 123 ACME Rd.
city : San Jose
state : CA
:User
name : Alice
since : 2013
elite : [2014, 2016]
:User
name : Bob
since : 2014
elite : null
:REVIEWS
stars : 5
date : 2014-02-03
:REVIEWS
stars : 4
date : 2014-08-03
2015 - 2018
https://git.io/fjZ25
Boulder City
(:User)-[:CO_REVIEWS]->(:User)
(:User)-[:REVIEWS]->(:Business)
(:User)-[:CO_REVIEWS]->(:User)
Constuct graphs for each year
Extract Yelp
subgraph for
specific city
(:Business)-[:CO_REVIEWED]->(:Business)
JDBC
Hive
Oracle
SQL Server
Orc
Parquet
Table/View
Table/View
Table/View
...
...
Graph DDL
Graph Instance
- Table mappings
SQL Tables Property Graphs
Property Graph
Node Tables
Rel. Tables
Graph Type
SQL Property Graph
Data Source
Spark SQL
Data Sources
Graph Type
- Element types
- Node types
- Relationship types
Part 1
From JSON to Graph
Create persistent
Property Graph from
raw Yelp dataset
Read Yelp Data from
JSON into DataFrames
Create Property Graph
from DataFrames
Store Property Graph
using Parquet
Part 2
A library of Graphs
Create a library of
graph projections
Read Property Graph
from Parquet
Create subgraph for a
specifc city
Project and persist city
subgraph
Part 3
Federated queries
Integrate reviews with
social network data
Define Graph Type and
Mapping with Graph
DDL
Load data from Hive
and H2
Run analytical query on
the integrated graph
Part 5
Neo4j Integration II
Recommend
businesses to users
Load graph projections
from library
Write graphs to Neo4j,
run Louvain + Jaccard
Run analytical query in
Morpheus to find
recommendations
Part 4
Neo4j Integration I
Find trending
businesses
Load graph projections
from library
Write graphs to Neo4j
and run PageRank
Combine graphs in
Morpheus and select
trending businesses
https://git.io/fjZ2b
:Business
name : ACME
address : 123 ACME Rd.
city : San Jose
state : CA
:User
name : Alice
since : 2013
elite : [2014, 2016]
email : alice@yelp.com
:User
name : Bob
since : 2014
elite : null
email : bob@yelp.com
:REVIEWS
stars : 5
date : 2014-02-03
:REVIEWS
stars : 4
date : 2014-08-03
:User
email: alice@yelp.com
:User
email : bob@yelp.com
:FRIEND
Yelp Reviews
Yelp Book
Graph DDL
+
SQL PGDS
(:User)-[:REVIEWS]->(:Business)
(:User)-[:FRIEND]->(:User)
https://git.io/fjZ2p
CREATE GRAPH TYPE yelp (
-- Element types (concepts used to describe a graph)
User ( name STRING, since DATE ),
Business ( name STRING, city STRING ),
REVIEWS ( stars INTEGER, date LOCALDATETIME ),
FRIEND,
-- Node types
(User),
(Business),
-- Relationship types
(User)-[REVIEWS]->(Business),
(User)-[FRIEND]->(User)
)
CREATE GRAPH yelp_and_yelpBook OF yelp (
-- Node type mappings
(User) FROM HIVE.yelp.user,
(Business) FROM HIVE.yelp.business,
-- Relationship type mappings
(User)-[REVIEWS]->(Business) FROM HIVE.yelp.review e
START NODES (User) FROM HIVE.yelp.user n JOIN e.user_email = n.email
END NODES (Business) FROM HIVE.yelp.business n JOIN e.business_id = n.business_id,
(User)-[FRIEND]->(User) FROM H2.yelpbook.friend e
START NODES (User) FROM HIVE.yelp.user n JOIN e.user1_email = n.email
END NODES (User) FROM HIVE.yelp.user n JOIN e.user2_email = n.email
)
● Morpheus and Neo4j
Graph Algorithms
● Spark Graph SPIP
sneak peek
● SQL/Cypher/GQL
https://theoatmeal.com/comics/sneak_peek
PROPERTY
GRAPH
composing
DataFrames
Hive, DF, JDBC
TABLES
SUB-
GRAPH
FS snapshot
Morpheus
SOURCES
DataFrame
Table Result
Cypher
QUERY
Property
Graph Result
Property
Graph Cypher
QUERY
Cypher
QUERY
Property
Graph Result
DataFrame
Driving Table
GRAPH
ALGOS
ANALYSIS
toolsets
DataFrame DataFrame
Property
Graph
Property
Graph
Morpheus
STORE
SUBGRAPH
FS snapshot
Property
Graph
Coming in Spark 3.0
Neo4j
Native Graph
Database
Analytics
Integrations
Cypher Query
Language
Wide Range of
APOC Procedures
Native
Graph Algorithms
• Parallel Breadth First Search*
• Parallel Depth First Search
• Shortest Path*
• Single-Source Shortest Path
• All Pairs Shortest Path
• Minimum Spanning Tree
• A* Shortest Path
• Yen’s K Shortest Path
• K-Spanning Tree (MST)
• Random Walk
• Degree Centrality
• Closeness Centrality
• CC Variations: Harmonic, Dangalchev,
Wasserman & Faust
• Betweenness Centrality
• Approximate Betweenness Centrality
• PageRank*
• Personalized PageRank
• ArticleRank
• Eigenvector Centrality
• Triangle Count*
• Clustering Coefficients
• Connected Components (Union Find)*
• Strongly Connected Components*
• Label Propagation*
• Louvain Modularity – 1 Step & Multi-Step
• Balanced Triad (identification)
• Euclidean Distance
• Cosine Similarity
• Jaccard Similarity
• Overlap Similarity
• Pearson Similarity
Pathfinding
& Search
Centrality /
Importance
Community
Detection
Similarity
neo4j.com/docs/
graph-algorithms/current/
Link
Prediction
• Adamic Adar
• Common Neighbors
• Preferential Attachment
• Resource Allocations
• Same Community
• Total Neighbors* Available in GraphFrames
Free O’Reilly Book
neo4j.com/
graph-algorithms-book
• Spark & Neo4j Examples
• Machine Learning Chapter
Part 1
From JSON to Graph
Create persistent
Property Graph from
raw Yelp dataset
Read Yelp Data from
JSON into DataFrames
Create Property Graph
from DataFrames
Store Property Graph
using Parquet
Part 2
A library of Graphs
Create a library of
graph projections
Read Property Graph
from Parquet
Create subgraph for a
specifc city
Project and persist city
subgraph
Part 3
Federated queries
Integrate reviews with
social network data
Define Graph Type and
Mapping with Graph
DDL
Load data from Hive
and H2
Run analytical query on
the integrated graph
Part 5
Neo4j Integration II
Recommend
businesses to users
Load graph projections
from library
Write graphs to Neo4j,
run Louvain + Jaccard
Run analytical query in
Morpheus to find
recommendations
Part 4
Neo4j Integration I
Find trending
businesses
Load graph projections
from library
Write graphs to Neo4j
and run PageRank
Combine graphs in
Morpheus and select
trending businesses
https://git.io/fjZ2b
● Use when
○ Anytime you’re looking for broad influence over a network
○ Many domain specific variations for differing analysis, e.g.
Personalized PageRank for personalized recommendations
● Examples:
○ Twitter Recommendations
○ Fraud Detection
2017
to
2018
call algo.pagerank
2017
2018
trendRank =
pageRank_2018 -
pageRank_2017
⋈
(:Business)
-[:CO_REVIEWED]->
(:Business)
https://git.io/fjZ2j
Part 1
From JSON to Graph
Create persistent
Property Graph from
raw Yelp dataset
Read Yelp Data from
JSON into DataFrames
Create Property Graph
from DataFrames
Store Property Graph
using Parquet
Part 2
A library of Graphs
Create a library of
graph projections
Read Property Graph
from Parquet
Create subgraph for a
specifc city
Project and persist city
subgraph
Part 3
Federated queries
Integrate reviews with
social network data
Define Graph Type and
Mapping with Graph
DDL
Load data from Hive
and H2
Run analytical query on
the integrated graph
Part 5
Neo4j Integration II
Recommend
businesses to users
Load graph projections
from library
Write graphs to Neo4j,
run Louvain + Jaccard
Run analytical query in
Morpheus to find
recommendations
Part 4
Neo4j Integration I
Find trending
businesses
Load graph projections
from library
Write graphs to Neo4j
and run PageRank
Combine graphs in
Morpheus and select
trending businesses
https://git.io/fjZ2b
● Use when
○ Community Detection in large networks
○ Uncover hierarchical structures in data
● Examples
○ Money Laundering
○ Protein-Protein-Interactions
● Use when
○ Computing pair-wise similarities
○ Accommodates vectors of different lengths
● Examples
○ Recommendations
○ Disambiguation
call algo.louvain
(:User)-[:REVIEWS]->(:Business)
(:User)-[:CO_REVIEWS]->(:User)
call algo.jaccard
Recommend
businesses similar
users have
reviewed
2017
Compute similarity
based on overlapping
reviewed businesses
Compute
communities based
on co-reviews
for each
community
:IS_SIMILAR
https://git.io/fjZaU
Coming in Spark 3.0
● SPARK-25994 Spark Graph for Apache Spark 3.0
○ Property Graphs, Cypher Queries, and Algorithms
● Defines a Cypher-compatible Property Graph
type based on DataFrames
● Replaces GraphFrames querying with Cypher
● Reimplements GraphFrames/GraphX algos on
the Property Graph type
● “Spark Cypher”
○ Run a Cypher 9 query on a Property Graph returning a
tabular result
● Migrate GraphFrames to Spark Graph
● Implementation is based on Spark SQL
○ Property Graphs are composed of one or more DFs
● Provide Scala, Python and Java APIs
● Addresses the Cypher Property Graph Model
○ Does not deal with variants of that model (e.g. RDF)
● No Cypher 10 multiple graph features
○ API is flexible to support this in future iterations
● No Property Graph Catalog
○ Also no Property Graph Data Sources
[SPARK-27299][GRAPH][WIP] Spark Graph API
design proposal (GraphExamplesSuite.scala)
test("create PropertyGraph from Node- and RelationshipFrames") {
val nodeData: DataFrame = spark.createDataFrame(Seq(0 -> "Alice", 1 -> "Bob")).toDF("id", "name")
val relationshipData: DataFrame = spark.createDataFrame(Seq((0, 0, 1))).toDF("id", "source", "target")
val nodeFrame: NodeFrame = NodeFrame(nodeData, "id", Set("Person"))
val relationshipFrame: RelationshipFrame = RelationshipFrame(relationshipData, "id", "source", "target", "KNOWS")
val graph: PropertyGraph = cypherSession.createGraph(Seq(nodeFrame), Seq(relationshipFrame))
val result: CypherResult = graph.cypher(
"""
|MATCH (a:Person)-[r:KNOWS]->(:Person)
|RETURN a, r""".stripMargin)
result.df.show()
}
https://git.io/fjqp6
spark-graph-api
spark-cypher
spark-sql
okapi morpheus
spark-sql
openCypherSPIP
Cypher to relational
operators compiler
openCypher
Spark SQL and “Spark GQL”
Two models, two languages
A common core of datatypes and expressions
GQL as the focal point of graph programming
Graph languages with a shared graph type system
Q&A
Thanks for listening

More Related Content

What's hot

Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...Databricks
 
Databricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI InitiativesDatabricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI InitiativesDatabricks
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in SparkPaco Nathan
 
An AI-Powered Chatbot to Simplify Apache Spark Performance Management
An AI-Powered Chatbot to Simplify Apache Spark Performance ManagementAn AI-Powered Chatbot to Simplify Apache Spark Performance Management
An AI-Powered Chatbot to Simplify Apache Spark Performance ManagementDatabricks
 
Apache® Spark™ MLlib 2.x: migrating ML workloads to DataFrames
Apache® Spark™ MLlib 2.x: migrating ML workloads to DataFramesApache® Spark™ MLlib 2.x: migrating ML workloads to DataFrames
Apache® Spark™ MLlib 2.x: migrating ML workloads to DataFramesDatabricks
 
What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017 What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017 Databricks
 
Neo4j Morpheus: Interweaving Documents, Tables and and Graph Data in Spark wi...
Neo4j Morpheus: Interweaving Documents, Tables and and Graph Data in Spark wi...Neo4j Morpheus: Interweaving Documents, Tables and and Graph Data in Spark wi...
Neo4j Morpheus: Interweaving Documents, Tables and and Graph Data in Spark wi...Databricks
 
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustStructuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustSpark Summit
 
H2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks CloudH2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks CloudSri Ambati
 
Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Fully Automated QA System For Large Scale Search And Recommendation Engines U...Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Fully Automated QA System For Large Scale Search And Recommendation Engines U...Spark Summit
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big dataSigmoid
 
Apache Spark Data Validation
Apache Spark Data ValidationApache Spark Data Validation
Apache Spark Data ValidationDatabricks
 
Databricks @ Strata SJ
Databricks @ Strata SJDatabricks @ Strata SJ
Databricks @ Strata SJDatabricks
 
Graphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphXGraphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphXAndrea Iacono
 
How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...
How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...
How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...Databricks
 
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Databricks
 
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...Databricks
 
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingApplied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingDatabricks
 
GraphFrames: Graph Queries in Spark SQL by Ankur Dave
GraphFrames: Graph Queries in Spark SQL by Ankur DaveGraphFrames: Graph Queries in Spark SQL by Ankur Dave
GraphFrames: Graph Queries in Spark SQL by Ankur DaveSpark Summit
 
Apache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new DirectionsApache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new DirectionsDatabricks
 

What's hot (20)

Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
Bridging the Gap Between Data Scientists and Software Engineers – Deploying L...
 
Databricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI InitiativesDatabricks + Snowflake: Catalyzing Data and AI Initiatives
Databricks + Snowflake: Catalyzing Data and AI Initiatives
 
Graph Analytics in Spark
Graph Analytics in SparkGraph Analytics in Spark
Graph Analytics in Spark
 
An AI-Powered Chatbot to Simplify Apache Spark Performance Management
An AI-Powered Chatbot to Simplify Apache Spark Performance ManagementAn AI-Powered Chatbot to Simplify Apache Spark Performance Management
An AI-Powered Chatbot to Simplify Apache Spark Performance Management
 
Apache® Spark™ MLlib 2.x: migrating ML workloads to DataFrames
Apache® Spark™ MLlib 2.x: migrating ML workloads to DataFramesApache® Spark™ MLlib 2.x: migrating ML workloads to DataFrames
Apache® Spark™ MLlib 2.x: migrating ML workloads to DataFrames
 
What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017 What to Expect for Big Data and Apache Spark in 2017
What to Expect for Big Data and Apache Spark in 2017
 
Neo4j Morpheus: Interweaving Documents, Tables and and Graph Data in Spark wi...
Neo4j Morpheus: Interweaving Documents, Tables and and Graph Data in Spark wi...Neo4j Morpheus: Interweaving Documents, Tables and and Graph Data in Spark wi...
Neo4j Morpheus: Interweaving Documents, Tables and and Graph Data in Spark wi...
 
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustStructuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
 
H2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks CloudH2O World - H2O Rains with Databricks Cloud
H2O World - H2O Rains with Databricks Cloud
 
Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Fully Automated QA System For Large Scale Search And Recommendation Engines U...Fully Automated QA System For Large Scale Search And Recommendation Engines U...
Fully Automated QA System For Large Scale Search And Recommendation Engines U...
 
Graph Analytics for big data
Graph Analytics for big dataGraph Analytics for big data
Graph Analytics for big data
 
Apache Spark Data Validation
Apache Spark Data ValidationApache Spark Data Validation
Apache Spark Data Validation
 
Databricks @ Strata SJ
Databricks @ Strata SJDatabricks @ Strata SJ
Databricks @ Strata SJ
 
Graphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphXGraphs are everywhere! Distributed graph computing with Spark GraphX
Graphs are everywhere! Distributed graph computing with Spark GraphX
 
How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...
How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...
How to Rebuild an End-to-End ML Pipeline with Databricks and Upwork with Than...
 
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
Vectorized Deep Learning Acceleration from Preprocessing to Inference and Tra...
 
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
 
Applied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce SettingApplied Machine Learning for Ranking Products in an Ecommerce Setting
Applied Machine Learning for Ranking Products in an Ecommerce Setting
 
GraphFrames: Graph Queries in Spark SQL by Ankur Dave
GraphFrames: Graph Queries in Spark SQL by Ankur DaveGraphFrames: Graph Queries in Spark SQL by Ankur Dave
GraphFrames: Graph Queries in Spark SQL by Ankur Dave
 
Apache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new DirectionsApache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new Directions
 

Similar to WIFI SSID and Password for Spark Summit

Morpheus SQL and Cypher® in Apache® Spark - Big Data Meetup Munich
Morpheus SQL and Cypher® in Apache® Spark - Big Data Meetup MunichMorpheus SQL and Cypher® in Apache® Spark - Big Data Meetup Munich
Morpheus SQL and Cypher® in Apache® Spark - Big Data Meetup MunichMartin Junghanns
 
Morpheus - SQL and Cypher in Apache Spark
Morpheus - SQL and Cypher in Apache SparkMorpheus - SQL and Cypher in Apache Spark
Morpheus - SQL and Cypher in Apache SparkHenning Kropp
 
Netflix Machine Learning Infra for Recommendations - 2018
Netflix Machine Learning Infra for Recommendations - 2018Netflix Machine Learning Infra for Recommendations - 2018
Netflix Machine Learning Infra for Recommendations - 2018Karthik Murugesan
 
ML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talkML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talkFaisal Siddiqi
 
The 2nd graph database in sv meetup
The 2nd graph database in sv meetupThe 2nd graph database in sv meetup
The 2nd graph database in sv meetupJoshua Bae
 
PGQL: A Language for Graphs
PGQL: A Language for GraphsPGQL: A Language for Graphs
PGQL: A Language for GraphsJean Ihm
 
Introduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastIntroduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastHolden Karau
 
GraphQL & DGraph with Go
GraphQL & DGraph with GoGraphQL & DGraph with Go
GraphQL & DGraph with GoJames Tan
 
Multiple graphs in openCypher
Multiple graphs in openCypherMultiple graphs in openCypher
Multiple graphs in openCypheropenCypher
 
Composable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and WeldComposable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and WeldDatabricks
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesPaco Nathan
 
Building a GraphQL API in PHP
Building a GraphQL API in PHPBuilding a GraphQL API in PHP
Building a GraphQL API in PHPAndrew Rota
 
Getting started with Apache Spark in Python - PyLadies Toronto 2016
Getting started with Apache Spark in Python - PyLadies Toronto 2016Getting started with Apache Spark in Python - PyLadies Toronto 2016
Getting started with Apache Spark in Python - PyLadies Toronto 2016Holden Karau
 
The openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query LanguageThe openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query LanguageNeo4j
 
Intro to Spark and Spark SQL
Intro to Spark and Spark SQLIntro to Spark and Spark SQL
Intro to Spark and Spark SQLjeykottalam
 
Modern APIs with GraphQL
Modern APIs with GraphQLModern APIs with GraphQL
Modern APIs with GraphQLTaikai
 

Similar to WIFI SSID and Password for Spark Summit (20)

Morpheus SQL and Cypher® in Apache® Spark - Big Data Meetup Munich
Morpheus SQL and Cypher® in Apache® Spark - Big Data Meetup MunichMorpheus SQL and Cypher® in Apache® Spark - Big Data Meetup Munich
Morpheus SQL and Cypher® in Apache® Spark - Big Data Meetup Munich
 
Morpheus - SQL and Cypher in Apache Spark
Morpheus - SQL and Cypher in Apache SparkMorpheus - SQL and Cypher in Apache Spark
Morpheus - SQL and Cypher in Apache Spark
 
Netflix Machine Learning Infra for Recommendations - 2018
Netflix Machine Learning Infra for Recommendations - 2018Netflix Machine Learning Infra for Recommendations - 2018
Netflix Machine Learning Infra for Recommendations - 2018
 
ML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talkML Infra for Netflix Recommendations - AI NEXTCon talk
ML Infra for Netflix Recommendations - AI NEXTCon talk
 
The 2nd graph database in sv meetup
The 2nd graph database in sv meetupThe 2nd graph database in sv meetup
The 2nd graph database in sv meetup
 
PGQL: A Language for Graphs
PGQL: A Language for GraphsPGQL: A Language for Graphs
PGQL: A Language for Graphs
 
Introduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at lastIntroduction to Spark Datasets - Functional and relational together at last
Introduction to Spark Datasets - Functional and relational together at last
 
GraphQL & DGraph with Go
GraphQL & DGraph with GoGraphQL & DGraph with Go
GraphQL & DGraph with Go
 
Multiple graphs in openCypher
Multiple graphs in openCypherMultiple graphs in openCypher
Multiple graphs in openCypher
 
Composable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and WeldComposable Parallel Processing in Apache Spark and Weld
Composable Parallel Processing in Apache Spark and Weld
 
GraphQL + relay
GraphQL + relayGraphQL + relay
GraphQL + relay
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communities
 
Building a GraphQL API in PHP
Building a GraphQL API in PHPBuilding a GraphQL API in PHP
Building a GraphQL API in PHP
 
Neo4j graph database
Neo4j graph databaseNeo4j graph database
Neo4j graph database
 
Neo4j: Graph-like power
Neo4j: Graph-like powerNeo4j: Graph-like power
Neo4j: Graph-like power
 
Getting started with Apache Spark in Python - PyLadies Toronto 2016
Getting started with Apache Spark in Python - PyLadies Toronto 2016Getting started with Apache Spark in Python - PyLadies Toronto 2016
Getting started with Apache Spark in Python - PyLadies Toronto 2016
 
HyperGraphQL
HyperGraphQLHyperGraphQL
HyperGraphQL
 
The openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query LanguageThe openCypher Project - An Open Graph Query Language
The openCypher Project - An Open Graph Query Language
 
Intro to Spark and Spark SQL
Intro to Spark and Spark SQLIntro to Spark and Spark SQL
Intro to Spark and Spark SQL
 
Modern APIs with GraphQL
Modern APIs with GraphQLModern APIs with GraphQL
Modern APIs with GraphQL
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...Florian Roscheck
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 

Recently uploaded (20)

IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...From idea to production in a day – Leveraging Azure ML and Streamlit to build...
From idea to production in a day – Leveraging Azure ML and Streamlit to build...
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 

WIFI SSID and Password for Spark Summit

  • 1. WIFI SSID:SparkAISummit | Password: UnifiedAnalytics
  • 3.
  • 4.
  • 5.
  • 6.
  • 8.
  • 9.
  • 10. Node ● Represents an entity within the graph ● Can have labels Relationship ● Connects a start node with an end node ● Has one type Property ● Describes a node/relationship: e.g. name, age, weight etc ● Key-value pair: String key; typed value (string, number, bool, list, ...)
  • 11.
  • 12. Property graph view of data mirrors conceptual view ○ Entities and relationships, with attributes ○ Nodes and relationships, with properties Graph queries are concise and visual (ASCII Art) MATCH (c:Customer)-[:BOUGHT]-(p:Product) RETURN c.id, p.id Network algorithms run over graphs → Graphs enhance data engineering and science
  • 14. Spark is an immutable data processing engine ○ Spark graphs are compositions of tables (DFs) ○ Spark graphs can be transformed and combined ○ Functions (including queries) over multiple graphs ○ Cypher query plans mapped to Catalyst Neo4j is a native transactional CRUD database ○ Neo4j graphs use a native graph data representation ○ Neo4j has optimized in-process MT graph algos ○ Morpheus helps move data in and out of Neo4j
  • 15. Graphs and tables are both useful data models ○ Finding paths and subgraphs, and transforming graphs ○ Viewing, aggregating and ordering values The Morpheus project parallels Spark SQL ○ PropertyGraph type (composed of DataFrames) ○ Catalog of graph data sources, named graphs, views, ○ Cypher query language A CypherSession adds graphs to a SparkSession
  • 16. ● Data integration ○ Integrate (non-)graphy data from multiple, heterogeneous data sources into one or more property graphs ● Distributed Cypher execution ○ OLAP-style graph analytics ● Data science ○ Integration with other Spark libraries ○ Feature extraction using Neo4j Graph Algorithms
  • 17. Pathfinding & Search Centrality / Importance Community Detection Link Prediction Finds optimal paths or evaluates route availability and quality Determines the importance of distinct nodes in the network Detects group clustering or partition options Evaluates how alike nodes are Estimates the likelihood of nodes forming a future relationship Similarity
  • 19. DataFrame Table Result Cypher QUERY Property Graph Result Property Graph Cypher QUERY Cypher QUERY Property Graph Result DataFrame Driving Table
  • 22.
  • 23. Cypher 9 is the latest full version of openCypher ○ Implemented in Neo4j 3.5 ○ Includes date/time types and functions ○ Implemented in whole/part by six other vendors ○ Several other partial and research implementations ○ Cypher for Gremlin is another openCypher project
  • 24. Cypher is a full CRUD language ← OLTP database ○ RETURNs only tabular results: not composable ○ Results can include graph elements (paths, relationships, nodes) or property values Morpheus implements most of read-only Cypher ○ No MERGE or DELETE ○ Spark immutable data + transformations
  • 25. Cypher 10 proposes Multiple Graph features ○ Multiple Graph CIP: https://git.io/fjmrx Allows for Cypher Query composition ○ Similar to chaining transformations on DataFrames Support Graph Catalog for managing Graphs ○ Analogous to Spark SQL catalog Query support for Graph Construction
  • 26. Input: a property graph Output: a table FROM GRAPH socialNetwork MATCH ({name: 'Dan'})-[:FRIEND*2]->(foaf) RETURN toUpper(foaf.name) AS name ORDER BY name DESC Language features available in Morpheus
  • 27. Input: a property graph Output: a property graph FROM GRAPH socialNetwork MATCH (p:Person)-[:FRIEND*2]->(foaf) WHERE NOT (p)-[:FRIEND]->(foaf) CONSTRUCT CREATE (p)-[:POSSIBLE_FRIEND]->(foaf) RETURN GRAPH Language features available in Morpheus
  • 28. Input: property graphs Output: a property graph FROM GRAPH socialNetwork MATCH (p:Person) FROM GRAPH products MATCH (c:Customer) WHERE p.email = c.email CONSTRUCT ON socialNetwork, products CREATE (p)-[:IS]->(c) RETURN GRAPH Language features available in Morpheus
  • 29. Input: property graphs Output: a property graph CATALOG CREATE VIEW youngFriends($inGraph){ FROM GRAPH $inGraph MATCH (p1:Person)-[r]->(p2:Person) WHERE p1.age < 25 AND p2.age < 25 CONSTRUCT CREATE (p1)-[COPY OF r]->(p2) RETURN GRAPH } Language features available in Morpheus
  • 30. Input: property graphs Output: table or graph FROM youngFriends(socialNetwork) MATCH (p:Person)-[r]->(o) RETURN p, r, o // and views over views FROM youngFriends(europe(socialNetwork)) MATCH ... Language features available in Morpheus
  • 31.
  • 32. Morpheus Query EngineProperty Graph Data Sources Property Graph Catalog Scala API SQL JDBC
  • 33. ● Distributed executionSpark Core Spark SQL ● Rule- and Cost-based query optimization via Catalyst MATCH (c:Captain)-[:COMMANDS]->(s:Ship) WHERE c.name = ‘Morpheus’ RETURN c.name, s.name openCypher Frontend ● Parsing, Rewriting, Normalization ● Semantic Analysis (Scoping, Typing, etc.) Morpheus ● Data Import and Export ● Schema and Type handling ● Query translation to Spark operations Relational Planning Logical Planning Spark Backend ● Translation into Logical Operators ● Basic Logical Optimization ● Backend Agnostic Query Representation ● Conversion and typing of Frontend expressions ● Translation into Relational Operations on abstract tables ● Column layout computation Intermediate Language ● Spark-specific table implementation
  • 34. ● In Morpheus, PropertyGraphs are represented by ○ Node Tables and Relationship Tables ● Tables are represented by DataFrames ○ Require a fixed schema ● Property Graphs have a Graph Type ○ Node and relationship types that occur in the graph ○ Node and relationship properties and their data type Property Graph Node Tables Rel. Tables Graph Type
  • 35. :Captain:Person name: Morpheus :Ship name: Nebuchadnezzar :COMMANDS id name 0 Morpheus id name 1 Nebuchadnezzar id source target 0 0 1 :Captain:Person :Ship :COMMANDS Graph Type { :Captain:Person ( name: STRING ), :Ship ( name: STRING ), :COMMANDS }
  • 36. Property Graph ⋈ ⋈ π MATCH (c:Captain)-[:COMMANDS]->(s:Ship) WHERE c.name = ‘Morpheus’ RETURN c.name, s.name π π Morpheus Relational Planning ...
  • 37.
  • 38. Part 1 From JSON to Graph Create persistent Property Graph from raw Yelp dataset Read Yelp Data from JSON into DataFrames Create Property Graph from DataFrames Store Property Graph using Parquet Part 2 A library of Graphs Create a library of graph projections Read Property Graph from Parquet Create subgraph for a specifc city Project and persist city subgraph Part 3 Federated queries Integrate reviews with social network data Define Graph Type and Mapping with Graph DDL Load data from Hive and H2 Run analytical query on the integrated graph Part 5 Neo4j Integration II Recommend businesses to users Load graph projections from library Write graphs to Neo4j, run Louvain + Jaccard Run analytical query in Morpheus to find recommendations Part 4 Neo4j Integration I Find trending businesses Load graph projections from library Write graphs to Neo4j and run PageRank Combine graphs in Morpheus and select trending businesses https://git.io/fjZ2b
  • 39. ● Yelp is a search service based on crowd-sourced reviews about local businesses ● The Yelp Open Dataset is part of the Yelp Dataset Challenge ○ Yelps’ effort to encourage researchers to explore the dataset ○ ~150K businesses, 10M users, 5M reviews, 35M friendships https://www.yelp.com https://www.yelp.com/dataset https://www.yelp.com/dataset/challenge
  • 40. :Business name : ACME address : 123 ACME Rd. city : San Jose state : CA :User name : Alice since : 2013 elite : [2014, 2016] :User name : Bob since : 2014 elite : null :REVIEWS stars : 5 date : 2014-02-03 :REVIEWS stars : 4 date : 2014-08-03
  • 41. business.json user.json review.json Create Node and Relationship Tables Create Property Graph Store Property Graph https://git.io/fjZ2N
  • 42. // (:User) val userDataFrame = spark.read.json(...).select(...) val userNodeTable = CAPSEntityTable.create(NodeMappingBuilder.on("id") .withImpliedLabel("User") .withPropertyKey("name") .withPropertyKey("yelping_since") .withPropertyKey("elite") .build, userDataFrame) id name yelping_since elite 0 Alice 2013 [2014, 2016] 1 Bob 2014 null
  • 43.
  • 44. ● Property Graphs are managed within a catalog Cypher Session Property Graph Catalog Property Graph Data Source <namespace> Property Graph <name> QualifiedGraphName = <namespace>.<name>
  • 45. ● API to operate with the query engine and the catalog trait CypherSession { def cypher( query: String, parameters: CypherMap = CypherMap.empty, drivingTable: Option[CypherRecords] = None ): Result def catalog: PropertyGraphCatalog }
  • 46. ● API to manage multiple Property Graphs ● Catalog functions can be executed via Cypher or Scala API trait PropertyGraphCatalog { def register(namespace: Namespace, dataSource: PropertyGraphDataSource): Unit def store(qualifiedGraphName: QualifiedGraphName, graph: PropertyGraph): Unit def graph(qualifiedGraphName: QualifiedGraphName): PropertyGraph def drop(qualifiedGraphName: QualifiedGraphName): Unit // additional methods for managing views, listing namespaces and graphs }
  • 47. ● API for loading and saving property graphs trait PropertyGraphDataSource { def hasGraph(name: GraphName): Boolean def graph(name: GraphName): PropertyGraph def schema(name: GraphName): Option[Schema] // additional methods for storing, deleting, listing graphs }
  • 48. PGDS Multiple graphs Read graphs Write graphs File-based Parquet, ORC, CSV HDFS, local, S3 Yes Yes Yes SQL Hive, Jdbc Yes Yes No Neo4j Bolt Yes Yes Yes Neo4j Bulk Import No No Yes
  • 49. Cypher Session Property Graph Catalog Property Graph Data Source <namespace> Property Graph <name> QualifiedGraphName = <namespace>.<name>
  • 50. Cypher Session Property Graph Catalog “social-net” (Neo4j PGDS) “US” (Property Graph) FROM social-net.US MATCH (p:Person) RETURN p
  • 51. Cypher Session Property Graph Catalog “social-net” (Neo4j PGDS) “US” “EU” “products” (SQL PGDS) “2018” “2017” FROM social-net.US MATCH (p:Person) FROM products.2018 MATCH (c:Customer) WHERE p.email = c.email RETURN p, c
  • 52. Cypher Session Property Graph Catalog “social-net” (Neo4j PGDS) “US” “EU” “products” (SQL PGDS) “2018” “2017” CATALOG CREATE GRAPH social-net.US_new { FROM social-net.US MATCH (p:Person) FROM products.2018 MATCH (c:Customer) WHERE p.email = c.email CONSTRUCT ON social-net.US CREATE (p)-[:SAME_AS]->(c) RETURN GRAPH }
  • 53. CATALOG CREATE GRAPH social-net.US_new { FROM social-net.US MATCH (p:Person) FROM products.2018 MATCH (c:Customer) WHERE p.email = c.email CONSTRUCT ON social-net.US CREATE (p)-[:SAME_AS]->(c) RETURN GRAPH } Cypher Session Property Graph Catalog “social-net” (Neo4j PGDS) “US” “EU” “products” (SQL PGDS) “2018” “2017” “US_new”
  • 54. Cypher Session Property Graph Catalog “social-net” (Neo4j PGDS) “US” “EU” ... CATALOG CREATE VIEW youngPeople($sn) { FROM $sn MATCH (p:Person)-[r]->(n) WHERE p.age < 21 CONSTRUCT CREATE (p)-[COPY OF r]->(n) RETURN GRAPH } FROM youngPeople(social-net.US) MATCH (p:Person) RETURN p “youngPeople” Views
  • 55. Part 1 From JSON to Graph Create persistent Property Graph from raw Yelp dataset Read Yelp Data from JSON into DataFrames Create Property Graph from DataFrames Store Property Graph using Parquet Part 2 A library of Graphs Create a library of graph projections Read Property Graph from Parquet Create subgraph for a specifc city Project and persist city subgraph Part 3 Federated queries Integrate reviews with social network data Define Graph Type and Mapping with Graph DDL Load data from Hive and H2 Run analytical query on the integrated graph Part 5 Neo4j Integration II Recommend businesses to users Load graph projections from library Write graphs to Neo4j, run Louvain + Jaccard Run analytical query in Morpheus to find recommendations Part 4 Neo4j Integration I Find trending businesses Load graph projections from library Write graphs to Neo4j and run PageRank Combine graphs in Morpheus and select trending businesses https://git.io/fjZ2b
  • 56. :Business name : ACME address : 123 ACME Rd. city : San Jose state : CA :User name : Alice since : 2013 elite : [2014, 2016] :User name : Bob since : 2014 elite : null :REVIEWS stars : 5 date : 2014-02-03 :REVIEWS stars : 4 date : 2014-08-03
  • 57. 2015 - 2018 https://git.io/fjZ25 Boulder City (:User)-[:CO_REVIEWS]->(:User) (:User)-[:REVIEWS]->(:Business) (:User)-[:CO_REVIEWS]->(:User) Constuct graphs for each year Extract Yelp subgraph for specific city (:Business)-[:CO_REVIEWED]->(:Business)
  • 58.
  • 59. JDBC Hive Oracle SQL Server Orc Parquet Table/View Table/View Table/View ... ... Graph DDL Graph Instance - Table mappings SQL Tables Property Graphs Property Graph Node Tables Rel. Tables Graph Type SQL Property Graph Data Source Spark SQL Data Sources Graph Type - Element types - Node types - Relationship types
  • 60. Part 1 From JSON to Graph Create persistent Property Graph from raw Yelp dataset Read Yelp Data from JSON into DataFrames Create Property Graph from DataFrames Store Property Graph using Parquet Part 2 A library of Graphs Create a library of graph projections Read Property Graph from Parquet Create subgraph for a specifc city Project and persist city subgraph Part 3 Federated queries Integrate reviews with social network data Define Graph Type and Mapping with Graph DDL Load data from Hive and H2 Run analytical query on the integrated graph Part 5 Neo4j Integration II Recommend businesses to users Load graph projections from library Write graphs to Neo4j, run Louvain + Jaccard Run analytical query in Morpheus to find recommendations Part 4 Neo4j Integration I Find trending businesses Load graph projections from library Write graphs to Neo4j and run PageRank Combine graphs in Morpheus and select trending businesses https://git.io/fjZ2b
  • 61. :Business name : ACME address : 123 ACME Rd. city : San Jose state : CA :User name : Alice since : 2013 elite : [2014, 2016] email : alice@yelp.com :User name : Bob since : 2014 elite : null email : bob@yelp.com :REVIEWS stars : 5 date : 2014-02-03 :REVIEWS stars : 4 date : 2014-08-03
  • 63. Yelp Reviews Yelp Book Graph DDL + SQL PGDS (:User)-[:REVIEWS]->(:Business) (:User)-[:FRIEND]->(:User) https://git.io/fjZ2p
  • 64. CREATE GRAPH TYPE yelp ( -- Element types (concepts used to describe a graph) User ( name STRING, since DATE ), Business ( name STRING, city STRING ), REVIEWS ( stars INTEGER, date LOCALDATETIME ), FRIEND, -- Node types (User), (Business), -- Relationship types (User)-[REVIEWS]->(Business), (User)-[FRIEND]->(User) )
  • 65. CREATE GRAPH yelp_and_yelpBook OF yelp ( -- Node type mappings (User) FROM HIVE.yelp.user, (Business) FROM HIVE.yelp.business, -- Relationship type mappings (User)-[REVIEWS]->(Business) FROM HIVE.yelp.review e START NODES (User) FROM HIVE.yelp.user n JOIN e.user_email = n.email END NODES (Business) FROM HIVE.yelp.business n JOIN e.business_id = n.business_id, (User)-[FRIEND]->(User) FROM H2.yelpbook.friend e START NODES (User) FROM HIVE.yelp.user n JOIN e.user1_email = n.email END NODES (User) FROM HIVE.yelp.user n JOIN e.user2_email = n.email )
  • 66. ● Morpheus and Neo4j Graph Algorithms ● Spark Graph SPIP sneak peek ● SQL/Cypher/GQL https://theoatmeal.com/comics/sneak_peek
  • 67.
  • 68.
  • 69.
  • 71. DataFrame Table Result Cypher QUERY Property Graph Result Property Graph Cypher QUERY Cypher QUERY Property Graph Result DataFrame Driving Table
  • 74.
  • 76.
  • 77. Neo4j Native Graph Database Analytics Integrations Cypher Query Language Wide Range of APOC Procedures Native Graph Algorithms
  • 78. • Parallel Breadth First Search* • Parallel Depth First Search • Shortest Path* • Single-Source Shortest Path • All Pairs Shortest Path • Minimum Spanning Tree • A* Shortest Path • Yen’s K Shortest Path • K-Spanning Tree (MST) • Random Walk • Degree Centrality • Closeness Centrality • CC Variations: Harmonic, Dangalchev, Wasserman & Faust • Betweenness Centrality • Approximate Betweenness Centrality • PageRank* • Personalized PageRank • ArticleRank • Eigenvector Centrality • Triangle Count* • Clustering Coefficients • Connected Components (Union Find)* • Strongly Connected Components* • Label Propagation* • Louvain Modularity – 1 Step & Multi-Step • Balanced Triad (identification) • Euclidean Distance • Cosine Similarity • Jaccard Similarity • Overlap Similarity • Pearson Similarity Pathfinding & Search Centrality / Importance Community Detection Similarity neo4j.com/docs/ graph-algorithms/current/ Link Prediction • Adamic Adar • Common Neighbors • Preferential Attachment • Resource Allocations • Same Community • Total Neighbors* Available in GraphFrames
  • 79. Free O’Reilly Book neo4j.com/ graph-algorithms-book • Spark & Neo4j Examples • Machine Learning Chapter
  • 80.
  • 81. Part 1 From JSON to Graph Create persistent Property Graph from raw Yelp dataset Read Yelp Data from JSON into DataFrames Create Property Graph from DataFrames Store Property Graph using Parquet Part 2 A library of Graphs Create a library of graph projections Read Property Graph from Parquet Create subgraph for a specifc city Project and persist city subgraph Part 3 Federated queries Integrate reviews with social network data Define Graph Type and Mapping with Graph DDL Load data from Hive and H2 Run analytical query on the integrated graph Part 5 Neo4j Integration II Recommend businesses to users Load graph projections from library Write graphs to Neo4j, run Louvain + Jaccard Run analytical query in Morpheus to find recommendations Part 4 Neo4j Integration I Find trending businesses Load graph projections from library Write graphs to Neo4j and run PageRank Combine graphs in Morpheus and select trending businesses https://git.io/fjZ2b
  • 82. ● Use when ○ Anytime you’re looking for broad influence over a network ○ Many domain specific variations for differing analysis, e.g. Personalized PageRank for personalized recommendations ● Examples: ○ Twitter Recommendations ○ Fraud Detection
  • 83. 2017 to 2018 call algo.pagerank 2017 2018 trendRank = pageRank_2018 - pageRank_2017 ⋈ (:Business) -[:CO_REVIEWED]-> (:Business) https://git.io/fjZ2j
  • 84. Part 1 From JSON to Graph Create persistent Property Graph from raw Yelp dataset Read Yelp Data from JSON into DataFrames Create Property Graph from DataFrames Store Property Graph using Parquet Part 2 A library of Graphs Create a library of graph projections Read Property Graph from Parquet Create subgraph for a specifc city Project and persist city subgraph Part 3 Federated queries Integrate reviews with social network data Define Graph Type and Mapping with Graph DDL Load data from Hive and H2 Run analytical query on the integrated graph Part 5 Neo4j Integration II Recommend businesses to users Load graph projections from library Write graphs to Neo4j, run Louvain + Jaccard Run analytical query in Morpheus to find recommendations Part 4 Neo4j Integration I Find trending businesses Load graph projections from library Write graphs to Neo4j and run PageRank Combine graphs in Morpheus and select trending businesses https://git.io/fjZ2b
  • 85. ● Use when ○ Community Detection in large networks ○ Uncover hierarchical structures in data ● Examples ○ Money Laundering ○ Protein-Protein-Interactions
  • 86. ● Use when ○ Computing pair-wise similarities ○ Accommodates vectors of different lengths ● Examples ○ Recommendations ○ Disambiguation
  • 87. call algo.louvain (:User)-[:REVIEWS]->(:Business) (:User)-[:CO_REVIEWS]->(:User) call algo.jaccard Recommend businesses similar users have reviewed 2017 Compute similarity based on overlapping reviewed businesses Compute communities based on co-reviews for each community :IS_SIMILAR https://git.io/fjZaU
  • 88.
  • 90. ● SPARK-25994 Spark Graph for Apache Spark 3.0 ○ Property Graphs, Cypher Queries, and Algorithms ● Defines a Cypher-compatible Property Graph type based on DataFrames ● Replaces GraphFrames querying with Cypher ● Reimplements GraphFrames/GraphX algos on the Property Graph type
  • 91. ● “Spark Cypher” ○ Run a Cypher 9 query on a Property Graph returning a tabular result ● Migrate GraphFrames to Spark Graph ● Implementation is based on Spark SQL ○ Property Graphs are composed of one or more DFs ● Provide Scala, Python and Java APIs
  • 92. ● Addresses the Cypher Property Graph Model ○ Does not deal with variants of that model (e.g. RDF) ● No Cypher 10 multiple graph features ○ API is flexible to support this in future iterations ● No Property Graph Catalog ○ Also no Property Graph Data Sources
  • 93. [SPARK-27299][GRAPH][WIP] Spark Graph API design proposal (GraphExamplesSuite.scala) test("create PropertyGraph from Node- and RelationshipFrames") { val nodeData: DataFrame = spark.createDataFrame(Seq(0 -> "Alice", 1 -> "Bob")).toDF("id", "name") val relationshipData: DataFrame = spark.createDataFrame(Seq((0, 0, 1))).toDF("id", "source", "target") val nodeFrame: NodeFrame = NodeFrame(nodeData, "id", Set("Person")) val relationshipFrame: RelationshipFrame = RelationshipFrame(relationshipData, "id", "source", "target", "KNOWS") val graph: PropertyGraph = cypherSession.createGraph(Seq(nodeFrame), Seq(relationshipFrame)) val result: CypherResult = graph.cypher( """ |MATCH (a:Person)-[r:KNOWS]->(:Person) |RETURN a, r""".stripMargin) result.df.show() } https://git.io/fjqp6
  • 94.
  • 96.
  • 97.
  • 98.
  • 99.
  • 100.
  • 101. Spark SQL and “Spark GQL” Two models, two languages A common core of datatypes and expressions GQL as the focal point of graph programming Graph languages with a shared graph type system