SlideShare a Scribd company logo
1 of 35
Building a Scalable
Platform for Sharing
500 Million Photos
Wouter Crooy & Ruben Heusinkveld
Solution Architect & Technical Lead, Albumprinter
Wouter Crooy
Solution Architect, Albumprinter
@wcrooy
Ruben Heusinkveld
Technical Lead, Albumprinter
@rheusinkveld
Who are we
• Wouter Crooy – Solution Architect
• Ruben Heusinkveld – Technical Lead
• Neo4j Certified Professionals
The photo organizer
• Deliver well organized, easy to use and secure storage for all
your images
• Ease the process of selecting photos for creating photo
products
• Started as part of a R&D ‘Skunk works’ project
The photo organizer
The photo organizer
The photo organizer
The photo organizer – from photos to products
The photo organizer – demo
https://minnebanken.no
The challenge
The challenge
• Replace legacy system with the new photo organizer
• Move 1.3 PB of photos from on premise to cloud storage
• Analyze & organize all photos (511 million)
• Data cleansing while importing
• Using the same technology / architecture during import and
after
• Ability to add features while importing
• Core of the systems are built in .NET
The import
• Hard deadline
• Factory closing that holds the data center with all photos
• Started 1st of April
• Minimum processing of 150 images / second
• ~500 queries / second to Neo4j
• Up to 700 EC2 instances on AWS
How we did it
• Micro services
• Command Query Responsibility Segregation (CQRS)
• Cluster
• Multiple write nodes
• Single master read only nodes
• HAProxy
• Cypher only via REST interface
• .NET Neo4jClient
Architecture
Neo4j
Cluster
HaProxy
Query Command
Frontend
Amazon
ElastiCache
Photo
processors
Other Services
 Notifications
 Authentication
 ....
Preview
Generation
Storage
Other database
(clusters)
Why we choose Neo4j
• Close to domain model
• Not an ordinary (relational) database
• Looking for relations between photos/users
• Scalable
• Flexible schema
• Natural / fluent queries
• ACID / data consistency
The design
Graph model
User
Photo
Photo
Event
BelongsTo
BelongsTo
Contains
Contains
Raw
Exif
HasExif
HasEvent
Raw
ExifHasExif
Graph model
User
Photo
BelongsTo Photo
Day
Day
Month
Year
DateTaken
Day Month
DateTaken
Day
Time
line
Year
HasTimeline
BelongsTo
Graph model
User PhotoBelongsTo
Collec
tion
OwnsCollection HasItem
UserIsSharedWith
Our Neo4j database
• More than 1 billion nodes
• 4.1 billion properties
• 2.6 billion relations
• Total store size of 863 GB
Command Query Responsibility Segregation
• Seperation between writing and reading data
• Different model between Query and Command API
• Independent scaling
UI
Cache
DB
Component
Component
Update
Publish
Write
Query
Command
Bumps and Solutions
CQRS Seperate Reads & Writes
• No active event publishing in place
• Specific scenarios for updating / writing data
• Ability to create seperate model for read and write
• Updates (pieces) the user graph
• Requires reliable and consistent read
• Scale out -> overloading locking of (user) graph
• After import
• Low performance scenarios -> cache with lower update priority
Read after write consistency
• All reads should contain the very latest and most accurate data
• Replication delay between servers
• Split on consistency
• Article by Aseem Kishore:
• https://neo4j.com/blog/advanced-neo4j-fiftythree-reading-writing-
scaling/
Graph locking
• Concurrency challenge
• Scale-out => more images from the same user
• Manage the input
• High spread of user/image combination
• Prevent concurrent analysis of multiple images from the same user
• :GET
/db/manage/server/jmx/domain/org.neo4j/instance%3Dkernel%
230%2Cname%3DLocking
Batch insert vs single insert
• Cypher CSV import per 1000 records
• Prevent locking caused by concurrency issues
No infinite scale out
• Find the sweet spot for the amount of cluster nodes
• +1 nodes => more replications updates => higher load on write
master
Timeline
• We’re looking for photos which should belong to each other
based on date-taken.
• Moving from full property scan to graph walking via the timeline.
• For large collection 75% less DB-hits
• Walking the timeline if looking for photos within a certain
timeframe
• Less photos to evaluate for property scan (SecondsSinceEpoch)
• Works perfectly for year, month, day selections
.NET & Rest interface
• Custom headers to REST Cypher endpoint (Filtered by HaProxy)
• To route to multiple write servers
• Sticky session per user
• Custom additions to .NET Neo4jclient
• Managing JSON resultset
Graph design considerations
• Property scan
• (User)<-[:BelongsTo]-(Photo)
• More photos
• Property search => full-graph-scan
• Differentiating property
• Create node
• No path/clustered indexes…. (yet..  )
• Making changes to the schema….
• For 550+ million nodes
Graph design improvements
Property search
match (u:User { Id: “001"})<-
[:BelongsTo]-(p:Photo)
where p.Favourite = true
return p
=> 2812 db hits
Node/Relationship search
match (u:User { Id: "001"})-
[:HasFavourites]-(f:Favourites)<-
[:IsFavourite]-(p:Photo)
return p
=> 13 db hits
• dbms.logs.query.* (don’t forget to enable parameters resolving)
• Our alternative: Integrate with Kibana / Elasticsearch
• https://neo4j.com/docs/operations-manual/current/reference/
The future
The future
• Neo4j 3.x
• Bolt
• Datamining
• Procedures / APOC
That’s a wrap

More Related Content

More from Neo4j

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansNeo4j
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...Neo4j
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosNeo4j
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Neo4j
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Neo4j
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeNeo4j
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsNeo4j
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j
 

More from Neo4j (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge Graphs
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with Graph
 

Recently uploaded

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 

Recently uploaded (20)

Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 

Building a Scalable Platform for Sharing 500 Million Photos

  • 1. Building a Scalable Platform for Sharing 500 Million Photos Wouter Crooy & Ruben Heusinkveld Solution Architect & Technical Lead, Albumprinter
  • 2. Wouter Crooy Solution Architect, Albumprinter @wcrooy
  • 3. Ruben Heusinkveld Technical Lead, Albumprinter @rheusinkveld
  • 4. Who are we • Wouter Crooy – Solution Architect • Ruben Heusinkveld – Technical Lead • Neo4j Certified Professionals
  • 5. The photo organizer • Deliver well organized, easy to use and secure storage for all your images • Ease the process of selecting photos for creating photo products • Started as part of a R&D ‘Skunk works’ project
  • 9. The photo organizer – from photos to products
  • 10. The photo organizer – demo https://minnebanken.no
  • 12. The challenge • Replace legacy system with the new photo organizer • Move 1.3 PB of photos from on premise to cloud storage • Analyze & organize all photos (511 million) • Data cleansing while importing • Using the same technology / architecture during import and after • Ability to add features while importing • Core of the systems are built in .NET
  • 13. The import • Hard deadline • Factory closing that holds the data center with all photos • Started 1st of April • Minimum processing of 150 images / second • ~500 queries / second to Neo4j • Up to 700 EC2 instances on AWS
  • 14. How we did it • Micro services • Command Query Responsibility Segregation (CQRS) • Cluster • Multiple write nodes • Single master read only nodes • HAProxy • Cypher only via REST interface • .NET Neo4jClient
  • 15. Architecture Neo4j Cluster HaProxy Query Command Frontend Amazon ElastiCache Photo processors Other Services  Notifications  Authentication  .... Preview Generation Storage Other database (clusters)
  • 16. Why we choose Neo4j • Close to domain model • Not an ordinary (relational) database • Looking for relations between photos/users • Scalable • Flexible schema • Natural / fluent queries • ACID / data consistency
  • 19. Graph model User Photo BelongsTo Photo Day Day Month Year DateTaken Day Month DateTaken Day Time line Year HasTimeline BelongsTo
  • 21. Our Neo4j database • More than 1 billion nodes • 4.1 billion properties • 2.6 billion relations • Total store size of 863 GB
  • 22. Command Query Responsibility Segregation • Seperation between writing and reading data • Different model between Query and Command API • Independent scaling UI Cache DB Component Component Update Publish Write Query Command
  • 24. CQRS Seperate Reads & Writes • No active event publishing in place • Specific scenarios for updating / writing data • Ability to create seperate model for read and write • Updates (pieces) the user graph • Requires reliable and consistent read • Scale out -> overloading locking of (user) graph • After import • Low performance scenarios -> cache with lower update priority
  • 25. Read after write consistency • All reads should contain the very latest and most accurate data • Replication delay between servers • Split on consistency • Article by Aseem Kishore: • https://neo4j.com/blog/advanced-neo4j-fiftythree-reading-writing- scaling/
  • 26. Graph locking • Concurrency challenge • Scale-out => more images from the same user • Manage the input • High spread of user/image combination • Prevent concurrent analysis of multiple images from the same user • :GET /db/manage/server/jmx/domain/org.neo4j/instance%3Dkernel% 230%2Cname%3DLocking
  • 27. Batch insert vs single insert • Cypher CSV import per 1000 records • Prevent locking caused by concurrency issues
  • 28. No infinite scale out • Find the sweet spot for the amount of cluster nodes • +1 nodes => more replications updates => higher load on write master
  • 29. Timeline • We’re looking for photos which should belong to each other based on date-taken. • Moving from full property scan to graph walking via the timeline. • For large collection 75% less DB-hits • Walking the timeline if looking for photos within a certain timeframe • Less photos to evaluate for property scan (SecondsSinceEpoch) • Works perfectly for year, month, day selections
  • 30. .NET & Rest interface • Custom headers to REST Cypher endpoint (Filtered by HaProxy) • To route to multiple write servers • Sticky session per user • Custom additions to .NET Neo4jclient • Managing JSON resultset
  • 31. Graph design considerations • Property scan • (User)<-[:BelongsTo]-(Photo) • More photos • Property search => full-graph-scan • Differentiating property • Create node • No path/clustered indexes…. (yet..  ) • Making changes to the schema…. • For 550+ million nodes
  • 32. Graph design improvements Property search match (u:User { Id: “001"})<- [:BelongsTo]-(p:Photo) where p.Favourite = true return p => 2812 db hits Node/Relationship search match (u:User { Id: "001"})- [:HasFavourites]-(f:Favourites)<- [:IsFavourite]-(p:Photo) return p => 13 db hits • dbms.logs.query.* (don’t forget to enable parameters resolving) • Our alternative: Integrate with Kibana / Elasticsearch • https://neo4j.com/docs/operations-manual/current/reference/
  • 34. The future • Neo4j 3.x • Bolt • Datamining • Procedures / APOC

Editor's Notes

  1. Ruben
  2. Ruben
  3. Ruben
  4. Ruben At Albelli we want to inspire people to relive and share life’s moments by easily creating beautiful personalized photo products. Vision: To brighten up the world by bringing people’s moments to life. Albumprinter is a Cimpress company. The most known brand here in the US for Cimpress is Vistaprint. I’m sure you’ve all know it. Albumprinter is based in Amsterdam, The Netherlands. We have multiple consumer brands to serve the European market Albumprinter aquired FotoKnudsen in June 2014
  5. Ruben Goal: Deliver well organized, easy to use and secure storage for all your images Build by team of 5 (1 designer, 1 frontend developer, 1 quality engineer, and Wouter and myself focusing on the backend)
  6. Ruben Launched June of this year Available on all devices
  7. Ruben Photos are automatically grouped together into events
  8. Ruben: Easy to share photos with friends or publicly if you want Privately via invites
  9. Ruben: The photos can be used to create any product like a photo book, calendar or wall decor
  10. Ruben: The photos can be used to create any product like a photo book, calendar or wall decor
  11. Wouter
  12. Wouter Not uploading duplicates
  13. Wouter
  14. Wouter
  15. Wouter In Neo4j we only store the metadata. The actual photos are stored in Amazon Simple Storage Service (S3).
  16. Wouter
  17. Ruben
  18. Ruben
  19. Ruben
  20. Ruben
  21. Ruben For all those photos this resulted in: More than 1 billion nodes 4.1 billion properties 2.6 billion relations Total store size of 863 GB
  22. Ruben I know it’s really ambitious to explain CQRS within 2 slides. But I would still like to explain why and how it could work with Neo4j. Events sourcing. Double update to db and cache. In our case we used a cache update/flush on certain rules. Pro: Less work, database is to large for cache. Con: Not always reliable cache source.
  23. Wouter
  24. Wouter Neo4j in it’s core is very capable of handling CQRS interfaces. Since you’re not updating a table but (parts) of the graph. Due to it’s ACID nature is should also be able to make sure there are no race-conditions. But since this archicture allows to massively scale out that does not always match the capebilities of a ACID DB. Especially in the cases where the writes are more occuring then the reads. Make sure the read is consistent In our situation, CQRS is extra complex since we have a ordered crawler (5+ steps) which also does the writes. But the crawler(s) and query api are still allowed to do reads. https://www.infoq.com/news/2015/05/cqrs-advantages http://udidahan.com/2011/04/22/when-to-avoid-cqrs/ http://udidahan.com/2009/12/09/clarified-cqrs/ http://udidahan.com/2010/08/31/race-conditions-dont-exist/ See also consistent read solution. In cases were we don’t need to have consistsent read we can use the case.
  25. Wouter Read fastly outnumber writes in our application as for many applications. Split on consistency, not read vs. write Track user last write time for read after write consistency Monitor and tune slave lag, via push/pull configs Stick slaves by user for read after read consistency https://neo4j.com/blog/advanced-neo4j-fiftythree-reading-writing-scaling/ Credits to Aseem Kishore and his team at FiftyThree for sharing this on the conference last year.
  26. Wouter Mainly during the importing of photos { "description" : "org.neo4j.kernel.info.LockInfo", "type" : "org.neo4j.kernel.info.LockInfo", "value" : [{ "name" : "description", "description" : "description", "value" : "ExclusiveLock[\nClient[1] waits for []]" }, { "name" : "resourceId", "description" : "resourceId", "value" : "2612184871" }, { "name" : "resourceType", "description" : "resourceType", "value" : "RELATIONSHIP" } ] }
  27. Wouter
  28. Wouter
  29. Wouter
  30. Wouter
  31. Wouter
  32. Wouter DB hits increase when the number of photos increases if you do the property search
  33. Wouter?
  34. Wouter?
  35. Wouter?