1. SAN FRANCISCO | 10.22.2014 THE BUSINESS GRAPH
The Business Graph
(Why we chose Neo4j to rebuild CrunchBase)
2. THE BUSINESS GRAPH
Kurt Freytag
Head of Product, CrunchBase
kurt@crunchbase.com
415.891.7761
@kfreytag
5’10”, 155lbs.
Coding since 1977
Who Am I?
3. THE BUSINESS GRAPH
• Concise History of CrunchBase
• Our Vision
• Why Neo4j?
• Building w/ Neo4j & The Web
• Q&A
What am I Talking About?
4. THE BUSINESS GRAPH
• Started in 2007 by Michael Arrington
• Zero dedicated staff from 2007-2013
• Organically became source of truth for Startup Ecosystem
• Millions of Monthly Users
• Ran on two crappy AWS servers
History of CrunchBase - In One Slide
MySQL 5.0Rails 2.0
5. THE BUSINESS GRAPH
• The Complete Graph of the Connected Business World
• Entities: people, products, companies
• Activities: fundings, acquisitions, job changes
• Connections: how everything relates
• Time: the lifecycle of every element
• World’s Most Powerful Startup Community
• Open to all
The Vision of CrunchBase
6. THE BUSINESS GRAPH
Emil Eifrem
Founder
• A natural way of modeling data
Why Neo4j?
Neotechnologies
Company
Neo4j Enterprise Edition
Product
Seed Round
Funding
Sunstone Capital
Investor
Connor Venture Partners
Investor
Lars Nordwall
COO
Philip Rathle
VP of Products
GraphConnect 2014
Event
Kurt Freytag
Speaker
7. THE BUSINESS GRAPH
• A natural way of modeling data
• Adapts easily to changing requirements
Why Neo4j?
Neotechnologies
Company
Seed Round
Funding
Sunstone Capital
Investor
Connor Venture Partners
Investor
Investment
Investment
John Smith
Lead Investor
John Smith
Lead Investor
8. THE BUSINESS GRAPH
• A natural way to model data
• Adapts easily to changing requirements
• Built-In Business Intelligence
• Very specific or very general questions
• We don’t know the questions in advance
Why Neo4j?
select
if (tg.described_count > 1, 'complex', 'basic') dup
o.normalized_name,
concat('=hyperlink("http://www.crunchbase.com', o.p
ifnull(o.domain, '') domain,
ifnull(o.homepage_url, '') homepage_url,
if(o.status = 'unknown', '', o.status) status,
o.permalink,
ifnull(o.investment_rounds, '') investment_rounds,
ifnull(o.funding_rounds, '') funding_rounds,
ifnull(o.relationships, '') relationships,
ifnull(o.milestones, '') milestones,
if( o.logo_url is null, '', 'Yes') has_logo,
length(ifnull(o.overview, '')) overview_length,
ifnull(o.created_by, '') created_by,
date_format(o.created_at, '%Y-%m-%d %H:%i:%s') crea
UNIX_TIMESTAMP(o.created_at) ts,
( ifnull(o.investment_rounds, 0)*20 +
ifnull(o.funding_rounds, 0)*20 +
ifnull(o.relationships, 0)*10 +
ifnull(o.milestones, 0) +
length(ifnull(o.overview, '')) +
if( o.logo_url is null, 0, 50)) entity_rank,
o.entity_type,
o.entity_id
from cb_objects o
join t_duplicate_objects td on td.object_id = o.id
join t_duplicate_groups tg on tg.id = td.duplicate_
EXPLAIN PLAN
9. THE BUSINESS GRAPH
• A natural way of modeling data
• Adapts easily to changing requirements
• Built-In Business Intelligence
• Very specific or very general questions
• We don’t know the questions in advance
• Directly maps to our OO thinking
Why Neo4j?
class Organization < BaseEntity
relationship :has_funding_round,
relationship :has_customer,
relationship :sponsors_event,
...
end
Neotechnologies
Company
class FundingRound < BaseActivity
attribute :announced_on,
attribute :closed_on,
attribute :funding_type,
attribute :series,
attribute :money_raised,
attribute :post_money_valuation,
...
end
Seed Round
Funding
class HasFundingRound < BaseRelationship
relationship :has_funding_round,
relationship :has_customer,
relationship :sponsors_event,
...
end
has_funding_round
10. THE BUSINESS GRAPH
• A natural way of modeling data
• Adapts easily to changing requirements
• Built-In Business Intelligence
• Very specific or very general questions
• We don’t know the questions in advance
• Directly maps to our OO thinking
• We move faster
• Just launched CrunchBase Events @ TC Disrupt London
• Design, development, QA, and release was 2 weeks
Why Neo4j?
11. Okay, if Neo’s so awesome, why doesn’t everybody use it?
12. THE BUSINESS GRAPH
• CGI
• design a data model
• roll-your-own database connection
• manually write all your queries
• ORM (Hibernate, Doctrine)
• design a data model
• build the objects
• map ‘em through configuration
Databases & the Web - A Brief History
13. THE BUSINESS GRAPH
• Today’s languages use datastores as dumb repos
• Generate schemas from code
• Isolate developer from writing queries
• Focus on business logic, not data
• Couple of Problems
• The DBA role existed for a reason
• Data modeling is the foundation of a scalable architecture
• Generated queries can easily be 1,000x less efficient
• Quick development can lead to slow applications
Database as a Commodity
14. THE BUSINESS GRAPH
• Neo4j is tough to adopt
• Languages don’t support it out-of-the-box
• The tools / drivers that exist are immature
• Neo4j is not plug-n-play
• However…
• Neo4j is ideal for Object-Oriented development
• Graphs are a natural fit for many use cases
• We need to make Neo4j as easy to choose as MySQL
Means that…
+ = ?
15. THE BUSINESS GRAPH
• ActiveRecord for Neo4j
• Implements a lot of ActiveModel
• Validations
• Serialization
• Callbacks
• Handles all Marshalling / UnMarshalling
• “Feels” like ActiveRecord
• Makes Neo4j plug-n-play for Rails
• We Will Open Source It
“Deja”