Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Moving	Graphs	to	Produc3on	
At	Scale	
Ian	Robinson
Overview	
•  Deployment	Op,ons	
•  Hardware/So5ware	Requirements	
•  HA	Architecture	
•  Backups	
•  Monitoring	
•  Tes,ng...
Deployment	Op3ons	
Embedded	
Server	
Server	with	Extensions
Deployment	Op3ons	
Embedded	
•  Host	Neo4j	in	Java	process	
•  Access	to	Neo4j’s	Java	APIs	
Server	
Server	with	Extensions...
Deployment	Op3ons	
Embedded	
Server	
•  Server	wraps	embedded	instance	
•  HTTP/JSON	interface	
•  Transac,onal	endpoint	
...
Deployment	Op3ons	
Embedded	
Server	
Server	with	Extensions	
•  JAX-RS	RESTful	resources	
•  Execute	complex	logic	on	serv...
Hardware	
CPU	
•  Intel	Core	i3	(minimum)	
•  Intel	Core	i7	(recommended)	
•  Neo4j	scales	with	the	number	of	cores	
•  Re...
SoEware	
Java	
•  OpenJDK	8	(preferred)	or	7	or	Oracle	Java	8	(preferred)	or	7	
•  IBM	JDK	on	POWER8	
•  G1	garbage	collec...
Instances	
•  HVM	(hardware	virtual	machine)	over	PV	(paravirtual)	
•  EBS-op,mized	
•  Dedicated	throughput	to	EBS	
•  C3...
HA	Architecture	
Database	
Transac,on	
Propaga,on	
Cluster	
Management	
Neo4j	HA	
Instance	2	
Database	
Transac,on	
Propag...
Cluster	Configura3on	
Joining	Cluster	
•  ha.initial_hosts (neo4j.proper)es)	
•  List	of	servers	to	contact	when	joining	cl...
HA	Endpoints	–	Useful	for	Load	Balancing	
Endpoint	 State	 Status	Code	 Body	
/db/manage/server/ha/master
	
Master	 200 OK...
HA	JMX	Endpoint	
JSON	Response	
•  Alive?	
•  Role	
•  Last	commiYed	transac,on	ID	
•  Instances	in	cluster	
•  Role	
•  I...
Cross	DC-Clusters	
•  Same	subnet	(consider	using	a	VPN)	
•  Bandwidth	between	DCs	aligned	with	write	throughput	
•  Commo...
Mul3-Region	Clusters	in	AWS	
All	Versions	
•  Recommended:	Amazon	VPC	(Virtual	Private	Cloud)	
2.3	Enterprise	
•  2.3	supp...
Scale	Horizontally	For	High	Read	Throughput	
Applica,on	
Master	 Slave	 Slave	
Load	Balancer	
HAProxy	
ELB	
NGINX
Scale	Horizontally	For	High	Read	Throughput	
Applica,on	
Master	 Slave	 Slave	
Read	Load	Balancer	Write	Load	Balancer
HAProxy	Configura3on	
hYp://blog.armbruster-it.de/2015/08/neo4j-and-haproxy-some-best-prac,ces-and-tricks/
Configure	HAProxy	as	Read	Load	Balancer	
global
daemon
maxconn 256
defaults
mode http
timeout connect 5000ms
timeout client...
Configure	HAProxy	as	Read	Load	Balancer	
global
daemon
maxconn 256
defaults
mode http
timeout connect 5000ms
timeout client...
Improve	Read	Performance	with	Cache	Sharding	
Applica,on	
1	 2	 3	
Load	Balancer	
MATCH (c:Country{name:'Australia'})...MA...
Cache	Sharding	Using	Consistent	Rou3ng	
Applica,on	
1	 2	 3	
Load	Balancer	
MATCH (c:Country{name:'Australia'})...MATCH (c...
Configure	HAProxy	for	Cache	Sharding	
global
daemon
maxconn 256
defaults
mode http
timeout connect 5000ms
timeout client 50...
Configure	HAProxy	for	Cache	Sharding	
global
daemon
maxconn 256
defaults
mode http
timeout connect 5000ms
timeout client 50...
Backups	
Modes	
•  Full	
•  Incremental	
•  On	top	of	a	previous	backup	
•  Uses	logical	logs	to	apply	changes,	so	logs	mu...
Backup	Strategies	
•  Local	or	remote	backups	
•  If	backing	up	to	remote	machine,	consistency	check	takes	place	offline	wit...
Backup	Strategies	
Backup	
Server	
A	 B	 C	
A	–	full	,	consistency	check	
B	–	full	,	consistency	check	
C	–	full	,	consist...
Monitoring	
Pull	
•  Metrics	available	via	JMX	and	HTTP	and	in	browser	
Push	
•  Metrics	publishing	included	in	2.3	(Enter...
Collate	Internal	and	External	Views	of	the	System	
System	
•  collectd	
Database	
•  Metrics	
•  Tail	messages.log	
HA	End...
Test	at	Scale	
Soak	Tests	
•  Representa,ve	dataset	and	queries	
•  Peak	load	and	above	
Verify	
•  Correctness	
•  Perfor...
Performance	Tips	–	Use	the	Cypher	Query	Planner	
8,386,880	hits	 59,272	hits	
CREATE INDEX
ON :Crime(description)
Performance	Tips	–	JVM	
•  Look	for	GC	pauses	in	messages.log	
•  grep blocked data/graph.db/messages.log
•  Caused	by	
• ...
Enable	GC	Logging	
Log	will	be	wriYen	to	data/log/neo4j-gc.log	
wrapper.java.additional=-Xloggc:data/log/neo4j-gc.log
wrap...
Performance	Tips	–	Unmanaged	Extensions	
•  Single	request,	many	opera,ons	
•  Reduce	network	latencies	
•  Mul,ple	implem...
Performance	Tips	–	Write	Requests	
•  Align	the	number	of	concurrent	write	requests	with	the	number	of	
Neo4j	server	threa...
Performance	Tips	–	Batch	Writes	Using	a	Queue	
Write	
Write	
Write	
Queue	
Single	
Thread	 Batch	
hYp://maxdemarzi.com/201...
Thank	You
Upcoming SlideShare
Loading in …5
×

Moving Graphs to Production At Scale

Ian Robinson, Engineer at Neo4j, talks about how you productionize your Neo4j-based application. In this talk from GraphConnect San Francisco 2015, he looks at some of the most important considerations around designing, building and operating a Neo4j app.

Topics include:
* Where Neo4j fits in your application architecture, in both its embedded and server modes
* How to configure its clustering for high availability and high read throughput
* Backup strategies
* The new monitoring capabilities in Neo4j 2.3

  • Login to see the comments

Moving Graphs to Production At Scale

  1. 1. Moving Graphs to Produc3on At Scale Ian Robinson
  2. 2. Overview •  Deployment Op,ons •  Hardware/So5ware Requirements •  HA Architecture •  Backups •  Monitoring •  Tes,ng •  Performance Tips
  3. 3. Deployment Op3ons Embedded Server Server with Extensions
  4. 4. Deployment Op3ons Embedded •  Host Neo4j in Java process •  Access to Neo4j’s Java APIs Server Server with Extensions Java APIs Applica,on
  5. 5. Deployment Op3ons Embedded Server •  Server wraps embedded instance •  HTTP/JSON interface •  Transac,onal endpoint Server with Extensions REST API REST API REST API Driver Applica,on Load balancer
  6. 6. Deployment Op3ons Embedded Server Server with Extensions •  JAX-RS RESTful resources •  Execute complex logic on server •  Close to the data •  Mul,ple opera,ons per request •  Integrate with backend systems •  Control HTTP request/response format, headers REST API REST API REST API Driver Applica,on Load balancer REST API Extensions
  7. 7. Hardware CPU •  Intel Core i3 (minimum) •  Intel Core i7 (recommended) •  Neo4j scales with the number of cores •  Requires Enterprise to scale beyond 4 cores Disk •  SLC (single-level cell) SSD w/SATA •  ext4 (recommended), ZFS •  Increase permiYed number of open files to 40,000 Memory •  Lots of RAM (for heap + page cache) •  8-12 GB heap (up to 24 GB) •  Explicitly set page cache to (store size + 10% + headroom) –  Otherwise defaults to 75% of RAM-minus-heap (50% in 2.3) dbms.pagecache.memory=10g neo4j.proper)es
  8. 8. SoEware Java •  OpenJDK 8 (preferred) or 7 or Oracle Java 8 (preferred) or 7 •  IBM JDK on POWER8 •  G1 garbage collector •  Default in 2.3 •  JDK 1.7.0_71 or later Opera3ng System •  Linux •  HP UX •  Windows 2012 wrapper.java.additional=-XX:+UseG1GC neo4j-wrapper.conf
  9. 9. Instances •  HVM (hardware virtual machine) over PV (paravirtual) •  EBS-op,mized •  Dedicated throughput to EBS •  C3 or C4 (compute-op,mized) •  E.g c4.2xlarge (15 GiB RAM, 8 vCPU, 1000 Mbps EBS throughput) •  R3 (memory-op,mized) •  E.g. r3.xlarge (30.5 GiB RAM, 4 vCPU) •  Not EBS-op,mized by default Volumes •  Provisioned IOPS (io1) for predictable performance •  For I/O intensive workloads •  Up to 30 IOPS per GiB –  E.g. 300 GiB volume, 9000 IOPS
  10. 10. HA Architecture Database Transac,on Propaga,on Cluster Management Neo4j HA Instance 2 Database Transac,on Propaga,on Cluster Management Neo4j HA Instance 1 Database Transac,on Propaga,on Cluster Management Neo4j HA Instance 3 Master
  11. 11. Cluster Configura3on Joining Cluster •  ha.initial_hosts (neo4j.proper)es) •  List of servers to contact when joining cluster •  All hosts must be available when star,ng instance •  For large clusters, supply only a small number of hosts, e.g. 3 Pull and Push Transac3ons •  ha.pull_interval=10s (off by default) •  ha.tx_push_factor=1 (default, but best efforts only) Tuning •  ha.heartbeat_timeout=11s (default) •  Heartbeats sent, by default, every 5s •  Increase ,meouts if pauses cause heartbeats to be delayed •  Warning: it will take longer to discover an instance has failed •  ha.state_switch_timeout=120s (default) •  Increase if new instances ,meout while catching up with master on startup
  12. 12. HA Endpoints – Useful for Load Balancing Endpoint State Status Code Body /db/manage/server/ha/master Master 200 OK true Slave 404 Not Found false Unknown 404 Not Found UNKNOWN /db/manage/server/ha/slave Master 404 Not Found false Slave 200 OK true Unknown 404 Not Found UNKNOWN /db/manage/server/ha/available Master 200 OK master Slave 200 OK slave Unknown 404 Not Found UNKNOWN From 2.3 onwards dbms.security.ha_status_auth_enabled=false neo4j.proper)es
  13. 13. HA JMX Endpoint JSON Response •  Alive? •  Role •  Last commiYed transac,on ID •  Instances in cluster •  Role •  Instance ID •  Available? •  URI Iden,fy slaves falling behind Does everyone agree on composi,on of cluster? /db/manage/server/jmx/domain/org.neo4j/instance%3Dkernel%230%2Cname%3DHigh%20Availability
  14. 14. Cross DC-Clusters •  Same subnet (consider using a VPN) •  Bandwidth between DCs aligned with write throughput •  Common prac,ce: instances in secondary run as slave-only •  Restricts master elec,on to the primary •  When failing over, reconfigure instances in secondary ha.slave_only=true neo4j.proper)es ha.slave_only=false neo4j.proper)es
  15. 15. Mul3-Region Clusters in AWS All Versions •  Recommended: Amazon VPC (Virtual Private Cloud) 2.3 Enterprise •  2.3 supports mul,-region clusters with no addi,onal infrastructure •  Use public DNS names rather than IP addresses in ha.initial hosts, ha.server and ha.cluster_server •  Warning: uses public internet
  16. 16. Scale Horizontally For High Read Throughput Applica,on Master Slave Slave Load Balancer HAProxy ELB NGINX
  17. 17. Scale Horizontally For High Read Throughput Applica,on Master Slave Slave Read Load Balancer Write Load Balancer
  18. 18. HAProxy Configura3on hYp://blog.armbruster-it.de/2015/08/neo4j-and-haproxy-some-best-prac,ces-and-tricks/
  19. 19. Configure HAProxy as Read Load Balancer global daemon maxconn 256 defaults mode http timeout connect 5000ms timeout client 50000ms timeout server 50000ms frontend http-in bind *:80 default_backend neo4j-slaves backend neo4j-slaves option httpchk GET /db/manage/server/ha/slave server s1 10.0.1.10:7474 maxconn 32 check server s2 10.0.1.11:7474 maxconn 32 check server s3 10.0.1.12:7474 maxconn 32 check listen admin bind *:8080 stats enable
  20. 20. Configure HAProxy as Read Load Balancer global daemon maxconn 256 defaults mode http timeout connect 5000ms timeout client 50000ms timeout server 50000ms frontend http-in bind *:80 default_backend neo4j-slaves backend neo4j-slaves option httpchk GET /db/manage/server/ha/slave server s1 10.0.1.10:7474 maxconn 32 check server s2 10.0.1.11:7474 maxconn 32 check server s3 10.0.1.12:7474 maxconn 32 check listen admin bind *:8080 stats enable 404 Not Found false 404 Not Found UNKNOWN 200 OK true Master Slave Unknown
  21. 21. Improve Read Performance with Cache Sharding Applica,on 1 2 3 Load Balancer MATCH (c:Country{name:'Australia'})...MATCH (c:Country{name:'Zambia'})...MATCH (c:Country{name:'Norway'})...
  22. 22. Cache Sharding Using Consistent Rou3ng Applica,on 1 2 3 Load Balancer MATCH (c:Country{name:'Australia'})...MATCH (c:Country{name:'Zambia'})...MATCH (c:Country{name:'Norway'})... A-I 1 J-R 2 S-Z 3 MATCH (c:Country{name:'Zambia'})...MATCH (c:Country{name:'Norway'})...MATCH (c:Country{name:'Australia'})...
  23. 23. Configure HAProxy for Cache Sharding global daemon maxconn 256 defaults mode http timeout connect 5000ms timeout client 50000ms timeout server 50000ms frontend http-in bind *:80 default_backend neo4j-slaves backend neo4j-slaves balance url_param country_code server s1 10.0.1.10:7474 maxconn 32 server s2 10.0.1.11:7474 maxconn 32 server s3 10.0.1.12:7474 maxconn 32 listen admin bind *:8080 stats enable
  24. 24. Configure HAProxy for Cache Sharding global daemon maxconn 256 defaults mode http timeout connect 5000ms timeout client 50000ms timeout server 50000ms frontend http-in bind *:80 default_backend neo4j-slaves backend neo4j-slaves balance url_param country_code server s1 10.0.1.10:7474 maxconn 32 server s2 10.0.1.11:7474 maxconn 32 server s3 10.0.1.12:7474 maxconn 32 listen admin bind *:8080 stats enable
  25. 25. Backups Modes •  Full •  Incremental •  On top of a previous backup •  Uses logical logs to apply changes, so logs must be kept at least 2 x backup interval Consistency Check •  Backup and standalone tool •  Evaluate store health •  Part of backup and standalone tool •  -verify false to disable in backup keep_logical_logs=7 days neo4j.proper)es
  26. 26. Backup Strategies •  Local or remote backups •  If backing up to remote machine, consistency check takes place offline with respect to the database •  Backup from a dedicated slave or round robin •  Choose a schedule: •  Full once per day, incremental every hour •  To restore from backup: •  Stop instance •  Replace graph.db with backup •  Start instance
  27. 27. Backup Strategies Backup Server A B C A – full , consistency check B – full , consistency check C – full , consistency check A – incremental B – incremental C – incremental … A – incremental B – incremental C – incremental A – full , consistency check B – full , consistency check C – full , consistency check bin/neo4j-backup -from single://neo4j.example.org:20000 -to /backups/201510151318263/graph.db -verify true|false
  28. 28. Monitoring Pull •  Metrics available via JMX and HTTP and in browser Push •  Metrics publishing included in 2.3 (Enterprise) •  Node, rela,onship, property counts •  HA network usage •  Transac,ons (ac,ve, started, commiYed, rolled back, etc) •  Neo4j page cache (page faults, evic,ons, flushes, excep,ons) •  JVM •  Published to: •  Graphite •  Ganglia •  CSV metrics.graphite.enabled=true metrics.graphite.server=52.29.63.174:2003 metrics.prefix=neo4j-1 neo4j.proper)es
  29. 29. Collate Internal and External Views of the System System •  collectd Database •  Metrics •  Tail messages.log HA Endpoints •  /db/manage/server/ha/master •  /db/manage/server/ha/slave Server Latencies •  h9p.log Cypher Queries •  dbms.querylog.enabled=true •  dbms.querylog.threshold=2s Applica3on metrics •  End-to-end latencies
  30. 30. Test at Scale Soak Tests •  Representa,ve dataset and queries •  Peak load and above Verify •  Correctness •  Performance •  Latency •  Throughput •  Stability Opera3ons •  Backup •  Disaster recovery •  Replace instances
  31. 31. Performance Tips – Use the Cypher Query Planner 8,386,880 hits 59,272 hits CREATE INDEX ON :Crime(description)
  32. 32. Performance Tips – JVM •  Look for GC pauses in messages.log •  grep blocked data/graph.db/messages.log •  Caused by •  Heap too small •  New/survivor space too small •  Badly wriYen Cypher query or unmanaged extension
  33. 33. Enable GC Logging Log will be wriYen to data/log/neo4j-gc.log wrapper.java.additional=-Xloggc:data/log/neo4j-gc.log wrapper.java.additional=-XX:+PrintGCDetails wrapper.java.additional=-XX:+PrintGCDateStamps wrapper.java.additional=-XX:+PrintGCApplicationStoppedTime wrapper.java.additional=-XX:+PrintTenuringDistribution wrapper.java.additional=-XX:+PrintGCCause neo4j-wrapper.conf
  34. 34. Performance Tips – Unmanaged Extensions •  Single request, many opera,ons •  Reduce network latencies •  Mul,ple implementa,on op,ons •  Cypher •  Traversal Framework •  Graph Algo Package •  Core API •  Control Request/Response Format •  JSON, CSV, protobuf, etc •  Domain-specific representa,ons •  Compact •  Conserve bandwidth •  HTTP Headers Extension Applica,on
  35. 35. Performance Tips – Write Requests •  Align the number of concurrent write requests with the number of Neo4j server threads on the master •  By default, number of server threads = number of CPUs reported available by the JVM •  Configure the number of threads in neo4j-server.proper)es using org.neo4j.server.webserver.maxthreads •  Service requests from a thread pool in your applica,on •  Use the thread pool queue depth to apply back pressure
  36. 36. Performance Tips – Batch Writes Using a Queue Write Write Write Queue Single Thread Batch hYp://maxdemarzi.com/2013/09/05/scaling-writes/ hYp://maxdemarzi.com/2014/07/01/scaling-concurrent-writes-in-neo4j/
  37. 37. Thank You

×