12. > 6,000
Superset charts and
dashboards
> 5,000
Experiments and
metrics
Data resources
Beyond the data warehouse
13. > 6,000
Superset charts and
dashboards
> 5,000
Experiments and
metrics
> 4,000
Tableau dashboards
and workbooks
Data resources
Beyond the data warehouse
14. > 6,000
Superset charts and
dashboards
> 5,000
Experiments and
metrics
> 4,000
Tableau dashboards
and workbooks
> 1,000
Knowledge posts
Data resources
Beyond the data warehouse
20. Portland
San Francisco
Los Angeles
Toronto
New York
Miami
Sao Paulo
Dublin
London
Paris
Barcelona
Berlin
Milan
Copenhagen
New Delhi
Seoul
Beijing
Tokyo
Sydney
Singapore
Washington, DC
> 20
Offices around the world
41. Databases
5
APIs
3
Airflow DAG
1
We leverage all these data resources to build a graph comprising of
nodes and relationships
The Airflow DAG is run everyday and the output is stored in Hive
42.
43. We gather over 10,000 thumbnails from the Tableau API,
Knowledge Repo database, and Superset screenshots
44. The winding data path
Airflow
Data transfer
Python
Graph datastore
py2neo
Python Neo4j
driver
Neo4j
Graph database
GraphAware
Neo4j/Elasticsearch plugin
Elasticsearch
Search engine
Flask
Python web framework
Hive
Data warehouse
45. The winding data path
Airflow
Data transfer
Python
Graph datastore
py2neo
Python Neo4j
driver
Neo4j
Graph database
GraphAware
Neo4j/Elasticsearch plugin
Elasticsearch
Search engine
Flask
Python web framework
Hive
Data warehouse
46. The winding data path
Airflow
Data transfer
Python
Graph datastore
py2neo
Python Neo4j
driver
Neo4j
Graph database
GraphAware
Neo4j/Elasticsearch plugin
Elasticsearch
Search engine
Flask
Python web framework
Hive
Data warehouse
47. The winding data path
Airflow
Data transfer
Python
Graph datastore
py2neo
Python Neo4j
driver
Neo4j
Graph database
GraphAware
Neo4j/Elasticsearch plugin
Elasticsearch
Search engine
Flask
Python web framework
Hive
Data warehouse
48. The winding data path
Airflow
Data transfer
Python
Graph datastore
py2neo
Python Neo4j
driver
Neo4j
Graph database
GraphAware
Neo4j/Elasticsearch plugin
Elasticsearch
Search engine
Flask
Python web framework
Hive
Data warehouse
49. The winding data path
Airflow
Data transfer
Python
Graph datastore
py2neo
Python Neo4j
driver
Neo4j
Graph database
GraphAware
Neo4j/Elasticsearch plugin
Elasticsearch
Search engine
Flask
Python web framework
Hive
Data warehouse
50. The winding data path
Airflow
Data transfer
Python
Graph datastore
py2neo
Python Neo4j
driver
Neo4j
Graph database
GraphAware
Neo4j/Elasticsearch plugin
Elasticsearch
Search engine
Flask
Python web framework
Hive
Data warehouse
51. The winding data path
Airflow
Data transfer
Python
Graph datastore
py2neo
Python Neo4j
driver
Neo4j
Graph database
GraphAware
Neo4j/Elasticsearch plugin
Elasticsearch
Search engine
Flask
Python web framework
Hive
Data warehouse
52. The winding data path
Airflow
Data transfer
Python
Graph datastore
py2neo
Python Neo4j
driver
Neo4j
Graph database
GraphAware
Neo4j/Elasticsearch plugin
Elasticsearch
Search engine
Flask
Python web framework
Hive
Data warehouse
53. The winding data path
Airflow
Data transfer
Python
Graph datastore
py2neo
Python Neo4j
driver
Neo4j
Graph database
GraphAware
Neo4j/Elasticsearch plugin
Elasticsearch
Search engine
Flask
Python web framework
Hive
Data warehouse
54. The winding data path
Airflow
Data transfer
Python
Graph datastore
py2neo
Python Neo4j
driver
Neo4j
Graph database
GraphAware
Neo4j/Elasticsearch plugin
Elasticsearch
Search engine
Flask
Python web framework
Hive
Data warehouse
55. The winding data path
Airflow
Data transfer
Python
Graph datastore
py2neo
Python Neo4j
driver
Neo4j
Graph database
GraphAware
Neo4j/Elasticsearch plugin
Elasticsearch
Search engine
Flask
Python web framework
Hive
Data warehouse
56. The winding data path
Airflow
Data transfer
Python
Graph datastore
py2neo
Python Neo4j
driver
Neo4j
Graph database
GraphAware
Neo4j/Elasticsearch plugin
Elasticsearch
Search engine
Flask
Python web framework
Hive
Data warehouse
57. The winding data path
Airflow
Data transfer
Python
Graph datastore
py2neo
Python Neo4j
driver
Neo4j
Graph database
GraphAware
Neo4j/Elasticsearch plugin
Elasticsearch
Search engine
Flask
Python web framework
Hive
Data warehouse
58. The winding data path
Airflow
Data transfer
Python
Graph datastore
py2neo
Python Neo4j
driver
Neo4j
Graph database
GraphAware
Neo4j/Elasticsearch plugin
Elasticsearch
Search engine
Flask
Python web framework
Hive
Data warehouse
59. Why we choose Neo4j for our database
The main reasons
60. Logical
Given our data is
represented as a graph
it is logical to use a
graph database to
store the data
Why we choose Neo4j for our database
The main reasons
61. Logical
Given our data is
represented as a graph
it is logical to use a
graph database to
store the data
Nimble
Performance wins
when dealing with
connected data versus
relational databases
Why we choose Neo4j for our database
The main reasons
62. Logical
Given our data is
represented as a graph
it is logical to use a
graph database to
store the data
Nimble
Performance wins
when dealing with
connected data versus
relational databases
Popular
It is the world’s leading
graph database and
the community edition
is free
Why we choose Neo4j for our database
The main reasons
63. Logical
Given our data is
represented as a graph
it is logical to use a
graph database to
store the data
Nimble
Performance wins
when dealing with
connected data versus
relational databases
Popular
It is the world’s leading
graph database and
the community edition
is free
Integrative
It integrates well with
Python and
Elasticsearch
Why we choose Neo4j for our database
The main reasons
64. The Neo4j and Elasticsearch symbiotic relationship
Courtesy of two GraphAware plugins
65. The Neo4j and Elasticsearch symbiotic relationship
Courtesy of two GraphAware plugins
Neo4j plugin
Provides bi-directional integration which transparently and asynchronously replicate data from
Neo4j to Elasticsearch
66. The Neo4j and Elasticsearch symbiotic relationship
Courtesy of two GraphAware plugins
Neo4j plugin
Provides bi-directional integration which transparently and asynchronously replicate data from
Neo4j to Elasticsearch
Elasticsearch plugin
Enables Elasticsearch to consult with the Neo4j database during a search query to enrich the
search rankings by leveraging the graph topology
78. Efficient data retrieval and uniqueness
Restrictions and workarounds with the Neo4j schema
Indexes
Neo4j provides indexes for efficient data retrieval similar to a RDMS, however they are only
defined for a single label
79. Efficient data retrieval and uniqueness
Restrictions and workarounds with the Neo4j schema
Indexes
Neo4j provides indexes for efficient data retrieval similar to a RDMS, however they are only
defined for a single label
Uniqueness Constraints
Ensures that properties are unique for all nodes for a specific single label
80. Efficient data retrieval and uniqueness
Restrictions and workarounds with the Neo4j schema
Indexes
Neo4j provides indexes for efficient data retrieval similar to a RDMS, however they are only
defined for a single label
Uniqueness Constraints
Ensures that properties are unique for all nodes for a specific single label
GraphAware UUID plugin
Transparently assigns a globally unique UUID property to newly created elements which
cannot be changed or deleted
84. Designing the user experience and interface of
a data tool should not be an afterthought
85. Designing the user experience and interface of
a data tool should not be an afterthought
86. Technical data power
user; the epitome of a
tribal knowledge
holder
Daphne Data
User personas
Less data literate;
needs to keep tabs on
her team’s resources
Manager Mel
New employee or
new team; has no idea
what’s going on
Nathan New
87. Designing for data exploration, discovery, and trust
Company dataSearch
Resource details
&meta-data
User data Group data
97. Search
Resource details
&meta-data
Company dataUser data Group data
Surface relationships,
everything’s a link to promote
exploration
Meta-data & consumption
Description, external link, social
98. Column details & value distributions
Table lineage
Enrich meta-data on the fly
Search
Resource details
&meta-data
Company dataUser data Group data
99. Column details & value distributions
Table lineage
Enrich meta-data on the fly
Search
Resource details
&meta-data
Company dataUser data Group data
120. The challenges
Complex
dependencies
An umbrella data tool is
vulnerable to changes
in upstream resource
dependencies
Data-dense design
Balancing simplicity and
functionality is hard;
most internal design
resources are not made
for data-rich apps
121. The challenges
Complex
dependencies
An umbrella data tool is
vulnerable to changes
in upstream resource
dependencies
Data-dense design
Balancing simplicity and
functionality is hard;
most internal design
resources are not made
for data-rich apps
Graph merging
Non-trivial Git-like
merging of (daily or real-
time) graph updates
122. The challenges
Complex
dependencies
An umbrella data tool is
vulnerable to changes
in upstream resource
dependencies
Data-dense design
Balancing simplicity and
functionality is hard;
most internal design
resources are not made
for data-rich apps
Graph flickering
Transient relationships
should not create
“flickering” artifacts
Graph merging
Non-trivial Git-like
merging of (daily or real-
time) graph updates
126. The future
New resource types
A/B tests, logging
schemas, SQL queries,
etc.
Certified content
Use certification to build
trust and enable users to
filter through a sea of
stale content
127. The future
New resource types
A/B tests, logging
schemas, SQL queries,
etc.
Certified content
Use certification to build
trust and enable users to
filter through a sea of
stale content
Alerts&
recommendations
Move from active
exploration to deliver
relevant updates and
content suggestions
128. The future
New resource types
A/B tests, logging
schemas, SQL queries,
etc.
Certified content
Use certification to build
trust and enable users to
filter through a sea of
stale content
Game-ification
Provide content
producers with a sense
of value
Alerts&
recommendations
Move from active
exploration to deliver
relevant updates and
content suggestions
130. The Dataportal team
Analytics&Experimentation Products
John Bodley
Software Engineer
Eli Brumbaugh
Experience Designer
Jeff Feng
Product Manager
Michelle Thomas
Software Engineer
Chris Williams
Data Visualization
131. The Dataportal team
Analytics&Experimentation Products
John Bodley
Software Engineer
Eli Brumbaugh
Experience Designer
Jeff Feng
Product Manager
Michelle Thomas
Software Engineer
Chris Williams
Data Visualization