SlideShare a Scribd company logo
1 of 19
Jukka Zitting  |  Senior Developer Repository performance tuning
Agenda Performance tuning steps Repository internals Basic content access Batch processing Clustering Query performance Full text indexing Questions and answers 2
Performance tuning steps Step 1: Identify the symptom Create a test case that consistently measures current performance Define the performance target if current level unacceptable Make sure that the test case and the target performance are really relevant Step 2: Identify the cause Main suspects: Hardware, Repository, Application, Client Revise the test case until the problem no longer occurs;for example: Selenium, JMeter, JUnit, Iometer Step 3: Identify/implement possible solutions Change content, configuration, code or upgrade hardware Step 4: Verify results If target not reached, iterate the process or revise the goal 3
Repository internals 4 Data Store Persistence Manager Query Index Cluster Journal
Data Store Content-addressed storage for large binary properties Arbitrarily sized binary streams Addressed by MD5 hash String properties not included, use UTF-8 to map to binary Fast delivery of binary content Read directly from disk Can also be read in ranges Improved write throughput Multiple uploads can proceed concurrently (within hardware limits) Cheap copies Garbage collection used to reclaim disk space Logically shared by the entire cluster 5 Data Store
Cluster Journal Journal of all persisted changes in the repository Content changes Namespace, nodetype registrations, etc. Used to keep all cluster nodes in sync Observation events to all cluster nodes (see JackrabbitEvent.isExternal) Search index updates Internal cache invalidation Old events need to be discarded eventually No notable performance impact, just extra disk space Keep events for the longest possible time a node can be offline without getting completely recreated Logically shared by the entire cluster Writes synchronized over the entire cluster 6 Cluster Journal
Persistence Manager Identifier-addressed storage for nodes and properties Each node has a UUID, even if not mix:referenceable Essentially a key-value store, even when backed by a RDBMS Also keeps track of node references Bundles as units of content Bundle = UUID, type, properties, child node references, etc. Only large binaries stored elsewhere in the data store Designed for balanced content hierarchies, avoid too many child nodes Atomic updates A save() call persists the entire transient space as a single atomic operation One PM per workspace (and one for the shared version store) Logically (often also physically) shared across a cluster 7 Persistence Manager
Query Index Inverse index based on Apache Lucene Flexible mapping from terms to node identifiers Special handling for the path structure Mostly synchronous index updates Long full text extraction tasks handled in background Other cluster nodes will update their indexes at next cluster sync  Everything indexed by default Indexing configuration for tweaking functionality, performance and disk usage One index per workspace (and one for the shared version store) Not shared across a cluster, indexes are local to each cluster node See http://wiki.apache.org/jackrabbit/Search#Search_Configuration 8 Query Index
Agenda Performance tuning steps Repository internals Basic content access Batch processing Clustering Query performance Indexing configuration Questions and answers 9
Basic content access Very fast access by path and ID Underlying storage addressed by ID, but path traversal is in any case needed for ACL checks Relevant caches: Path to ID map (internal structure, not configurable) Item state caches (automatically balanced, configurable for special cases) Bundle cache (default fairly low, increase for large deployments) Also some PM-specific options (TarPM index, etc.) Caches optimized for a reasonably sized active working set typical web access pattern: handful of key resources and a long tail of less frequently accessed content, few writes Performance hit especially when updating nodes with lots of child nodes FineGrainedISMLocking for concurrent, non-overlapping writes 10
Example: Bundle cache configuration 11 <!-- In …/repository/worspaces/${wsp.name}/workspace.xml --> <Workspace …>   <PersistenceManager class=“…">   <paramname="bundleCacheSize" value="8"/>   </PersistenceManager> </Workspace>
Batch processing Two issues: read and write Reading lots of content Tree traversal the best approach, but will flood caches Schedule for off-peak times Add explicit delay (used by the garbage collectors) Use a dedicated cluster node for batch processing Writing lots of content (including deleting large subtrees) The entire transient space is kept in memory and committed atomically Split the operation to smaller pieces Save after every ~1k nodes Leverage the data store if possible 12
Clustering Good for horizontally scaling reads Practically zero overhead on read access Not so good for heavy concurrent writes Exclusive lock over the whole cluster Direct all writes to a single master node Leverage the data store Note the cluster sync interval for query consistency, etc. Session.refresh() can be used to force a cluster sync 13
Query performance What’s really fast? Constraints on properties, node types, full text Typically O(n) where n is the number of results, vs. the total number of nodes  What’s pretty fast? Path constraints What needs some planning? Constraints on the child axis Sorting, limit/offset  Joins What’s not yet available? Aggregate queries (COUNT, SUM, DISTINCT, etc.) Faceting 14
Join engine 15 SELECT a.* FROM [nt:unstructured] AS a JOIN [nt:unstructured] AS b   <PersistenceManager class=“…">   <paramname="bundleCacheSize" value="8"/>   </PersistenceManager> </Workspace>
Indexing configuration Default configuration Index all non-binary properties Index binary jcr:data properties (think nt:file/nt:resource) Full text extraction support for all major document formats Full text extraction from images, packages, etc. is explicitly disabled CQ5 / WEM comes with default aggregate indexing rules for cq:Pages, etc. Why change the configuration? Reduce the index size (by default almost as large as the PM) Enable features like aggregate indexes Assign boost values for selected properties to improve search result relevance 16
Indexing configuration How to change the configuration? indexing_configuration.xml file in the workspace directory Referenced by the indexingConfiguration option in the workspace.xml file See http://wiki.apache.org/jackrabbit/IndexingConfiguration Example: 17 <?xml version="1.0"?><!DOCTYPE configuration SYSTEM "http://jackrabbit.apache.org/dtd/indexing-configuration-1.0.dtd"><configuration xmlns:jcr="http://www.jcp.org/jcr/1.0" xmlns:nt="http://www.jcp.org/jcr/nt/1.0">   <aggregateprimaryType="nt:file">    <include>jcr:content</include>  </aggregate> </configuration>
Question and Answers 18
Repository performance tuning

More Related Content

What's hot

中小規模サービスのApacheチューニング
中小規模サービスのApacheチューニング中小規模サービスのApacheチューニング
中小規模サービスのApacheチューニング勲 國府田
 
AWS Networking – Advanced Concepts and new capabilities | AWS Summit Tel Aviv...
AWS Networking – Advanced Concepts and new capabilities | AWS Summit Tel Aviv...AWS Networking – Advanced Concepts and new capabilities | AWS Summit Tel Aviv...
AWS Networking – Advanced Concepts and new capabilities | AWS Summit Tel Aviv...Amazon Web Services
 
쿠버네티스 기반 PaaS 솔루션 - Playce Kube를 소개합니다.
쿠버네티스 기반 PaaS 솔루션 - Playce Kube를 소개합니다.쿠버네티스 기반 PaaS 솔루션 - Playce Kube를 소개합니다.
쿠버네티스 기반 PaaS 솔루션 - Playce Kube를 소개합니다.Open Source Consulting
 
AWS Black Belt Online Seminar AWS Key Management Service (KMS)
AWS Black Belt Online Seminar AWS Key Management Service (KMS) AWS Black Belt Online Seminar AWS Key Management Service (KMS)
AWS Black Belt Online Seminar AWS Key Management Service (KMS) Amazon Web Services Japan
 
Access Control Models: Controlling Resource Authorization
Access Control Models: Controlling Resource AuthorizationAccess Control Models: Controlling Resource Authorization
Access Control Models: Controlling Resource AuthorizationMark Niebergall
 
AWS Black Belt Techシリーズ AWS Direct Connect
AWS Black Belt Techシリーズ AWS Direct ConnectAWS Black Belt Techシリーズ AWS Direct Connect
AWS Black Belt Techシリーズ AWS Direct ConnectAmazon Web Services Japan
 
AWS Lambda and the Serverless Cloud
AWS Lambda and the Serverless CloudAWS Lambda and the Serverless Cloud
AWS Lambda and the Serverless CloudAmazon Web Services
 
Manage your Windows Infrastructure with Puppet Bolt - August 26 - 2020
Manage your Windows Infrastructure with Puppet Bolt - August 26 - 2020Manage your Windows Infrastructure with Puppet Bolt - August 26 - 2020
Manage your Windows Infrastructure with Puppet Bolt - August 26 - 2020Puppet
 
Amazon Aurora - Auroraの止まらない進化とその中身
Amazon Aurora - Auroraの止まらない進化とその中身Amazon Aurora - Auroraの止まらない進化とその中身
Amazon Aurora - Auroraの止まらない進化とその中身Amazon Web Services Japan
 
Deep Dive on Amazon EC2 Systems Manager
Deep Dive on Amazon EC2 Systems ManagerDeep Dive on Amazon EC2 Systems Manager
Deep Dive on Amazon EC2 Systems ManagerAmazon Web Services
 
AWS Step Functionsを使ったバックアップシステム
AWS Step Functionsを使ったバックアップシステムAWS Step Functionsを使ったバックアップシステム
AWS Step Functionsを使ったバックアップシステムAkihiro Kamiyama
 
더욱 진화하는 AWS 네트워크 보안 - 신은수 AWS 시큐리티 스페셜리스트 솔루션즈 아키텍트 :: AWS Summit Seoul 2021
더욱 진화하는 AWS 네트워크 보안 - 신은수 AWS 시큐리티 스페셜리스트 솔루션즈 아키텍트 :: AWS Summit Seoul 2021더욱 진화하는 AWS 네트워크 보안 - 신은수 AWS 시큐리티 스페셜리스트 솔루션즈 아키텍트 :: AWS Summit Seoul 2021
더욱 진화하는 AWS 네트워크 보안 - 신은수 AWS 시큐리티 스페셜리스트 솔루션즈 아키텍트 :: AWS Summit Seoul 2021Amazon Web Services Korea
 
EMR 플랫폼 기반의 Spark 워크로드 실행 최적화 방안 - 정세웅, AWS 솔루션즈 아키텍트:: AWS Summit Online Ko...
EMR 플랫폼 기반의 Spark 워크로드 실행 최적화 방안 - 정세웅, AWS 솔루션즈 아키텍트::  AWS Summit Online Ko...EMR 플랫폼 기반의 Spark 워크로드 실행 최적화 방안 - 정세웅, AWS 솔루션즈 아키텍트::  AWS Summit Online Ko...
EMR 플랫폼 기반의 Spark 워크로드 실행 최적화 방안 - 정세웅, AWS 솔루션즈 아키텍트:: AWS Summit Online Ko...Amazon Web Services Korea
 
20190814 AWS Black Belt Online Seminar AWS Serverless Application Model
20190814 AWS Black Belt Online Seminar AWS Serverless Application Model  20190814 AWS Black Belt Online Seminar AWS Serverless Application Model
20190814 AWS Black Belt Online Seminar AWS Serverless Application Model Amazon Web Services Japan
 
Spring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise PlatformSpring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise PlatformVMware Tanzu
 
Introduction to AMQP Messaging with RabbitMQ
Introduction to AMQP Messaging with RabbitMQIntroduction to AMQP Messaging with RabbitMQ
Introduction to AMQP Messaging with RabbitMQDmitriy Samovskiy
 
ABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueAmazon Web Services
 
20190226 AWS Black Belt Online Seminar Amazon WorkSpaces
20190226 AWS Black Belt Online Seminar Amazon WorkSpaces20190226 AWS Black Belt Online Seminar Amazon WorkSpaces
20190226 AWS Black Belt Online Seminar Amazon WorkSpacesAmazon Web Services Japan
 

What's hot (20)

中小規模サービスのApacheチューニング
中小規模サービスのApacheチューニング中小規模サービスのApacheチューニング
中小規模サービスのApacheチューニング
 
AWS Networking – Advanced Concepts and new capabilities | AWS Summit Tel Aviv...
AWS Networking – Advanced Concepts and new capabilities | AWS Summit Tel Aviv...AWS Networking – Advanced Concepts and new capabilities | AWS Summit Tel Aviv...
AWS Networking – Advanced Concepts and new capabilities | AWS Summit Tel Aviv...
 
쿠버네티스 기반 PaaS 솔루션 - Playce Kube를 소개합니다.
쿠버네티스 기반 PaaS 솔루션 - Playce Kube를 소개합니다.쿠버네티스 기반 PaaS 솔루션 - Playce Kube를 소개합니다.
쿠버네티스 기반 PaaS 솔루션 - Playce Kube를 소개합니다.
 
AWS Black Belt Online Seminar AWS Key Management Service (KMS)
AWS Black Belt Online Seminar AWS Key Management Service (KMS) AWS Black Belt Online Seminar AWS Key Management Service (KMS)
AWS Black Belt Online Seminar AWS Key Management Service (KMS)
 
Access Control Models: Controlling Resource Authorization
Access Control Models: Controlling Resource AuthorizationAccess Control Models: Controlling Resource Authorization
Access Control Models: Controlling Resource Authorization
 
AWS Black Belt Techシリーズ AWS Direct Connect
AWS Black Belt Techシリーズ AWS Direct ConnectAWS Black Belt Techシリーズ AWS Direct Connect
AWS Black Belt Techシリーズ AWS Direct Connect
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
AWS Lambda and the Serverless Cloud
AWS Lambda and the Serverless CloudAWS Lambda and the Serverless Cloud
AWS Lambda and the Serverless Cloud
 
Manage your Windows Infrastructure with Puppet Bolt - August 26 - 2020
Manage your Windows Infrastructure with Puppet Bolt - August 26 - 2020Manage your Windows Infrastructure with Puppet Bolt - August 26 - 2020
Manage your Windows Infrastructure with Puppet Bolt - August 26 - 2020
 
Amazon Aurora - Auroraの止まらない進化とその中身
Amazon Aurora - Auroraの止まらない進化とその中身Amazon Aurora - Auroraの止まらない進化とその中身
Amazon Aurora - Auroraの止まらない進化とその中身
 
Deep Dive on Amazon EC2 Systems Manager
Deep Dive on Amazon EC2 Systems ManagerDeep Dive on Amazon EC2 Systems Manager
Deep Dive on Amazon EC2 Systems Manager
 
AWS Step Functionsを使ったバックアップシステム
AWS Step Functionsを使ったバックアップシステムAWS Step Functionsを使ったバックアップシステム
AWS Step Functionsを使ったバックアップシステム
 
더욱 진화하는 AWS 네트워크 보안 - 신은수 AWS 시큐리티 스페셜리스트 솔루션즈 아키텍트 :: AWS Summit Seoul 2021
더욱 진화하는 AWS 네트워크 보안 - 신은수 AWS 시큐리티 스페셜리스트 솔루션즈 아키텍트 :: AWS Summit Seoul 2021더욱 진화하는 AWS 네트워크 보안 - 신은수 AWS 시큐리티 스페셜리스트 솔루션즈 아키텍트 :: AWS Summit Seoul 2021
더욱 진화하는 AWS 네트워크 보안 - 신은수 AWS 시큐리티 스페셜리스트 솔루션즈 아키텍트 :: AWS Summit Seoul 2021
 
EMR 플랫폼 기반의 Spark 워크로드 실행 최적화 방안 - 정세웅, AWS 솔루션즈 아키텍트:: AWS Summit Online Ko...
EMR 플랫폼 기반의 Spark 워크로드 실행 최적화 방안 - 정세웅, AWS 솔루션즈 아키텍트::  AWS Summit Online Ko...EMR 플랫폼 기반의 Spark 워크로드 실행 최적화 방안 - 정세웅, AWS 솔루션즈 아키텍트::  AWS Summit Online Ko...
EMR 플랫폼 기반의 Spark 워크로드 실행 최적화 방안 - 정세웅, AWS 솔루션즈 아키텍트:: AWS Summit Online Ko...
 
20190814 AWS Black Belt Online Seminar AWS Serverless Application Model
20190814 AWS Black Belt Online Seminar AWS Serverless Application Model  20190814 AWS Black Belt Online Seminar AWS Serverless Application Model
20190814 AWS Black Belt Online Seminar AWS Serverless Application Model
 
Spring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise PlatformSpring Boot+Kafka: the New Enterprise Platform
Spring Boot+Kafka: the New Enterprise Platform
 
Introduction to AMQP Messaging with RabbitMQ
Introduction to AMQP Messaging with RabbitMQIntroduction to AMQP Messaging with RabbitMQ
Introduction to AMQP Messaging with RabbitMQ
 
ABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS GlueABD315_Serverless ETL with AWS Glue
ABD315_Serverless ETL with AWS Glue
 
20190226 AWS Black Belt Online Seminar Amazon WorkSpaces
20190226 AWS Black Belt Online Seminar Amazon WorkSpaces20190226 AWS Black Belt Online Seminar Amazon WorkSpaces
20190226 AWS Black Belt Online Seminar Amazon WorkSpaces
 
Security Architectures on AWS
Security Architectures on AWSSecurity Architectures on AWS
Security Architectures on AWS
 

Viewers also liked

Apache Jackrabbit @ Swiss Open Source Awards 2011
Apache Jackrabbit @ Swiss Open Source Awards 2011Apache Jackrabbit @ Swiss Open Source Awards 2011
Apache Jackrabbit @ Swiss Open Source Awards 2011Jukka Zitting
 
OSGifying the repository
OSGifying the repositoryOSGifying the repository
OSGifying the repositoryJukka Zitting
 
Oak, the architecture of Apache Jackrabbit 3
Oak, the architecture of Apache Jackrabbit 3Oak, the architecture of Apache Jackrabbit 3
Oak, the architecture of Apache Jackrabbit 3Jukka Zitting
 
MicroKernel & NodeStore
MicroKernel & NodeStoreMicroKernel & NodeStore
MicroKernel & NodeStoreJukka Zitting
 
The return of the hierarchical model
The return of the hierarchical modelThe return of the hierarchical model
The return of the hierarchical modelJukka Zitting
 
Open source masterclass - Life in the Apache Incubator
Open source masterclass - Life in the Apache IncubatorOpen source masterclass - Life in the Apache Incubator
Open source masterclass - Life in the Apache IncubatorJukka Zitting
 
/path/to/content - the Apache Jackrabbit content repository
/path/to/content - the Apache Jackrabbit content repository/path/to/content - the Apache Jackrabbit content repository
/path/to/content - the Apache Jackrabbit content repositoryJukka Zitting
 
Apache development with GitHub and Travis CI
Apache development with GitHub and Travis CIApache development with GitHub and Travis CI
Apache development with GitHub and Travis CIJukka Zitting
 
Content extraction with apache tika
Content extraction with apache tikaContent extraction with apache tika
Content extraction with apache tikaJukka Zitting
 
Content Management With Apache Jackrabbit
Content Management With Apache JackrabbitContent Management With Apache Jackrabbit
Content Management With Apache JackrabbitJukka Zitting
 
The new repository in AEM 6
The new repository in AEM 6The new repository in AEM 6
The new repository in AEM 6Jukka Zitting
 
Enterprise Manager: Write powerful scripts with EMCLI
Enterprise Manager: Write powerful scripts with EMCLIEnterprise Manager: Write powerful scripts with EMCLI
Enterprise Manager: Write powerful scripts with EMCLIGokhan Atil
 
JCR, Sling or AEM? Which API should I use and when?
JCR, Sling or AEM? Which API should I use and when?JCR, Sling or AEM? Which API should I use and when?
JCR, Sling or AEM? Which API should I use and when?connectwebex
 
Oracle Enterprise Manager Cloud Control 13c for DBAs
Oracle Enterprise Manager Cloud Control 13c for DBAsOracle Enterprise Manager Cloud Control 13c for DBAs
Oracle Enterprise Manager Cloud Control 13c for DBAsGokhan Atil
 
新浪云平台的经验和教训
新浪云平台的经验和教训新浪云平台的经验和教训
新浪云平台的经验和教训easychen
 
Shakespeare revealed 02.ppt
Shakespeare revealed 02.pptShakespeare revealed 02.ppt
Shakespeare revealed 02.pptrwakefor
 
Digital thinking
Digital thinkingDigital thinking
Digital thinkingTony Ryan
 
Open Cultuur Data Masterclass #3 - Open State - Lex Slaghuis
Open Cultuur Data Masterclass #3 - Open State - Lex SlaghuisOpen Cultuur Data Masterclass #3 - Open State - Lex Slaghuis
Open Cultuur Data Masterclass #3 - Open State - Lex SlaghuisKennisland
 

Viewers also liked (20)

Apache Jackrabbit @ Swiss Open Source Awards 2011
Apache Jackrabbit @ Swiss Open Source Awards 2011Apache Jackrabbit @ Swiss Open Source Awards 2011
Apache Jackrabbit @ Swiss Open Source Awards 2011
 
OSGifying the repository
OSGifying the repositoryOSGifying the repository
OSGifying the repository
 
Oak, the architecture of Apache Jackrabbit 3
Oak, the architecture of Apache Jackrabbit 3Oak, the architecture of Apache Jackrabbit 3
Oak, the architecture of Apache Jackrabbit 3
 
MicroKernel & NodeStore
MicroKernel & NodeStoreMicroKernel & NodeStore
MicroKernel & NodeStore
 
The return of the hierarchical model
The return of the hierarchical modelThe return of the hierarchical model
The return of the hierarchical model
 
Open source masterclass - Life in the Apache Incubator
Open source masterclass - Life in the Apache IncubatorOpen source masterclass - Life in the Apache Incubator
Open source masterclass - Life in the Apache Incubator
 
/path/to/content - the Apache Jackrabbit content repository
/path/to/content - the Apache Jackrabbit content repository/path/to/content - the Apache Jackrabbit content repository
/path/to/content - the Apache Jackrabbit content repository
 
Apache development with GitHub and Travis CI
Apache development with GitHub and Travis CIApache development with GitHub and Travis CI
Apache development with GitHub and Travis CI
 
Content extraction with apache tika
Content extraction with apache tikaContent extraction with apache tika
Content extraction with apache tika
 
Content Management With Apache Jackrabbit
Content Management With Apache JackrabbitContent Management With Apache Jackrabbit
Content Management With Apache Jackrabbit
 
The new repository in AEM 6
The new repository in AEM 6The new repository in AEM 6
The new repository in AEM 6
 
Enterprise Manager: Write powerful scripts with EMCLI
Enterprise Manager: Write powerful scripts with EMCLIEnterprise Manager: Write powerful scripts with EMCLI
Enterprise Manager: Write powerful scripts with EMCLI
 
JCR, Sling or AEM? Which API should I use and when?
JCR, Sling or AEM? Which API should I use and when?JCR, Sling or AEM? Which API should I use and when?
JCR, Sling or AEM? Which API should I use and when?
 
Oracle Enterprise Manager Cloud Control 13c for DBAs
Oracle Enterprise Manager Cloud Control 13c for DBAsOracle Enterprise Manager Cloud Control 13c for DBAs
Oracle Enterprise Manager Cloud Control 13c for DBAs
 
新浪云平台的经验和教训
新浪云平台的经验和教训新浪云平台的经验和教训
新浪云平台的经验和教训
 
Good Luck
Good LuckGood Luck
Good Luck
 
Shakespeare revealed 02.ppt
Shakespeare revealed 02.pptShakespeare revealed 02.ppt
Shakespeare revealed 02.ppt
 
Marek
MarekMarek
Marek
 
Digital thinking
Digital thinkingDigital thinking
Digital thinking
 
Open Cultuur Data Masterclass #3 - Open State - Lex Slaghuis
Open Cultuur Data Masterclass #3 - Open State - Lex SlaghuisOpen Cultuur Data Masterclass #3 - Open State - Lex Slaghuis
Open Cultuur Data Masterclass #3 - Open State - Lex Slaghuis
 

Similar to Repository performance tuning

Overview of MongoDB and Other Non-Relational Databases
Overview of MongoDB and Other Non-Relational DatabasesOverview of MongoDB and Other Non-Relational Databases
Overview of MongoDB and Other Non-Relational DatabasesAndrew Kandels
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...javier ramirez
 
Performance and predictability
Performance and predictabilityPerformance and predictability
Performance and predictabilityRichardWarburton
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsFirat Atagun
 
IntelliJ IDEA Architecture and Performance
IntelliJ IDEA Architecture and PerformanceIntelliJ IDEA Architecture and Performance
IntelliJ IDEA Architecture and Performanceintelliyole
 
Optimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardwareOptimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardwareIndicThreads
 
Planning for-high-performance-web-application
Planning for-high-performance-web-applicationPlanning for-high-performance-web-application
Planning for-high-performance-web-applicationNguyễn Duy Nhân
 
Apache ignite as in-memory computing platform
Apache ignite as in-memory computing platformApache ignite as in-memory computing platform
Apache ignite as in-memory computing platformSurinder Mehra
 
Unit-4 swapping.pptx
Unit-4 swapping.pptxUnit-4 swapping.pptx
Unit-4 swapping.pptxItechAnand1
 
Performance and predictability
Performance and predictabilityPerformance and predictability
Performance and predictabilityRichardWarburton
 
Climbing the beanstalk
Climbing the beanstalkClimbing the beanstalk
Climbing the beanstalkgordonyorke
 
Drupal Backend Performance and Scalability
Drupal Backend Performance and ScalabilityDrupal Backend Performance and Scalability
Drupal Backend Performance and ScalabilityAshok Modi
 
Ch9 OS
Ch9 OSCh9 OS
Ch9 OSC.U
 
FOWA Scaling The Lamp Stack Workshop
FOWA Scaling The Lamp Stack WorkshopFOWA Scaling The Lamp Stack Workshop
FOWA Scaling The Lamp Stack Workshopdlieberman
 
Main memory os - prashant odhavani- 160920107003
Main memory   os - prashant odhavani- 160920107003Main memory   os - prashant odhavani- 160920107003
Main memory os - prashant odhavani- 160920107003Prashant odhavani
 

Similar to Repository performance tuning (20)

Overview of MongoDB and Other Non-Relational Databases
Overview of MongoDB and Other Non-Relational DatabasesOverview of MongoDB and Other Non-Relational Databases
Overview of MongoDB and Other Non-Relational Databases
 
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
Cómo se diseña una base de datos que pueda ingerir más de cuatro millones de ...
 
Performance and predictability
Performance and predictabilityPerformance and predictability
Performance and predictability
 
Ch8
Ch8Ch8
Ch8
 
NoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, ImplementationsNoSQL Introduction, Theory, Implementations
NoSQL Introduction, Theory, Implementations
 
IntelliJ IDEA Architecture and Performance
IntelliJ IDEA Architecture and PerformanceIntelliJ IDEA Architecture and Performance
IntelliJ IDEA Architecture and Performance
 
Optimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardwareOptimizing your java applications for multi core hardware
Optimizing your java applications for multi core hardware
 
Planning for-high-performance-web-application
Planning for-high-performance-web-applicationPlanning for-high-performance-web-application
Planning for-high-performance-web-application
 
Apache ignite as in-memory computing platform
Apache ignite as in-memory computing platformApache ignite as in-memory computing platform
Apache ignite as in-memory computing platform
 
Unit-4 swapping.pptx
Unit-4 swapping.pptxUnit-4 swapping.pptx
Unit-4 swapping.pptx
 
Performance and predictability
Performance and predictabilityPerformance and predictability
Performance and predictability
 
Climbing the beanstalk
Climbing the beanstalkClimbing the beanstalk
Climbing the beanstalk
 
Drupal Backend Performance and Scalability
Drupal Backend Performance and ScalabilityDrupal Backend Performance and Scalability
Drupal Backend Performance and Scalability
 
tittle
tittletittle
tittle
 
OSCh9
OSCh9OSCh9
OSCh9
 
Ch9 OS
Ch9 OSCh9 OS
Ch9 OS
 
OS_Ch9
OS_Ch9OS_Ch9
OS_Ch9
 
Chapter 8 - Main Memory
Chapter 8 - Main MemoryChapter 8 - Main Memory
Chapter 8 - Main Memory
 
FOWA Scaling The Lamp Stack Workshop
FOWA Scaling The Lamp Stack WorkshopFOWA Scaling The Lamp Stack Workshop
FOWA Scaling The Lamp Stack Workshop
 
Main memory os - prashant odhavani- 160920107003
Main memory   os - prashant odhavani- 160920107003Main memory   os - prashant odhavani- 160920107003
Main memory os - prashant odhavani- 160920107003
 

More from Jukka Zitting

Text and metadata extraction with Apache Tika
Text and metadata extraction with Apache TikaText and metadata extraction with Apache Tika
Text and metadata extraction with Apache TikaJukka Zitting
 
Mime Magic With Apache Tika
Mime Magic With Apache TikaMime Magic With Apache Tika
Mime Magic With Apache TikaJukka Zitting
 
Content Storage With Apache Jackrabbit
Content Storage With Apache JackrabbitContent Storage With Apache Jackrabbit
Content Storage With Apache JackrabbitJukka Zitting
 
Introduction to JCR and Apache Jackrabbi
Introduction to JCR and Apache JackrabbiIntroduction to JCR and Apache Jackrabbi
Introduction to JCR and Apache JackrabbiJukka Zitting
 
File System On Steroids
File System On SteroidsFile System On Steroids
File System On SteroidsJukka Zitting
 
Mime Magic With Apache Tika
Mime Magic With Apache TikaMime Magic With Apache Tika
Mime Magic With Apache TikaJukka Zitting
 
Design and architecture of Jackrabbit
Design and architecture of JackrabbitDesign and architecture of Jackrabbit
Design and architecture of JackrabbitJukka Zitting
 

More from Jukka Zitting (9)

Text and metadata extraction with Apache Tika
Text and metadata extraction with Apache TikaText and metadata extraction with Apache Tika
Text and metadata extraction with Apache Tika
 
Mime Magic With Apache Tika
Mime Magic With Apache TikaMime Magic With Apache Tika
Mime Magic With Apache Tika
 
NoSQL Oakland
NoSQL OaklandNoSQL Oakland
NoSQL Oakland
 
Content Storage With Apache Jackrabbit
Content Storage With Apache JackrabbitContent Storage With Apache Jackrabbit
Content Storage With Apache Jackrabbit
 
Introduction to JCR and Apache Jackrabbi
Introduction to JCR and Apache JackrabbiIntroduction to JCR and Apache Jackrabbi
Introduction to JCR and Apache Jackrabbi
 
File System On Steroids
File System On SteroidsFile System On Steroids
File System On Steroids
 
Mime Magic With Apache Tika
Mime Magic With Apache TikaMime Magic With Apache Tika
Mime Magic With Apache Tika
 
Design and architecture of Jackrabbit
Design and architecture of JackrabbitDesign and architecture of Jackrabbit
Design and architecture of Jackrabbit
 
Apache Tika
Apache TikaApache Tika
Apache Tika
 

Recently uploaded

Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 

Recently uploaded (20)

Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 

Repository performance tuning

  • 1. Jukka Zitting | Senior Developer Repository performance tuning
  • 2. Agenda Performance tuning steps Repository internals Basic content access Batch processing Clustering Query performance Full text indexing Questions and answers 2
  • 3. Performance tuning steps Step 1: Identify the symptom Create a test case that consistently measures current performance Define the performance target if current level unacceptable Make sure that the test case and the target performance are really relevant Step 2: Identify the cause Main suspects: Hardware, Repository, Application, Client Revise the test case until the problem no longer occurs;for example: Selenium, JMeter, JUnit, Iometer Step 3: Identify/implement possible solutions Change content, configuration, code or upgrade hardware Step 4: Verify results If target not reached, iterate the process or revise the goal 3
  • 4. Repository internals 4 Data Store Persistence Manager Query Index Cluster Journal
  • 5. Data Store Content-addressed storage for large binary properties Arbitrarily sized binary streams Addressed by MD5 hash String properties not included, use UTF-8 to map to binary Fast delivery of binary content Read directly from disk Can also be read in ranges Improved write throughput Multiple uploads can proceed concurrently (within hardware limits) Cheap copies Garbage collection used to reclaim disk space Logically shared by the entire cluster 5 Data Store
  • 6. Cluster Journal Journal of all persisted changes in the repository Content changes Namespace, nodetype registrations, etc. Used to keep all cluster nodes in sync Observation events to all cluster nodes (see JackrabbitEvent.isExternal) Search index updates Internal cache invalidation Old events need to be discarded eventually No notable performance impact, just extra disk space Keep events for the longest possible time a node can be offline without getting completely recreated Logically shared by the entire cluster Writes synchronized over the entire cluster 6 Cluster Journal
  • 7. Persistence Manager Identifier-addressed storage for nodes and properties Each node has a UUID, even if not mix:referenceable Essentially a key-value store, even when backed by a RDBMS Also keeps track of node references Bundles as units of content Bundle = UUID, type, properties, child node references, etc. Only large binaries stored elsewhere in the data store Designed for balanced content hierarchies, avoid too many child nodes Atomic updates A save() call persists the entire transient space as a single atomic operation One PM per workspace (and one for the shared version store) Logically (often also physically) shared across a cluster 7 Persistence Manager
  • 8. Query Index Inverse index based on Apache Lucene Flexible mapping from terms to node identifiers Special handling for the path structure Mostly synchronous index updates Long full text extraction tasks handled in background Other cluster nodes will update their indexes at next cluster sync Everything indexed by default Indexing configuration for tweaking functionality, performance and disk usage One index per workspace (and one for the shared version store) Not shared across a cluster, indexes are local to each cluster node See http://wiki.apache.org/jackrabbit/Search#Search_Configuration 8 Query Index
  • 9. Agenda Performance tuning steps Repository internals Basic content access Batch processing Clustering Query performance Indexing configuration Questions and answers 9
  • 10. Basic content access Very fast access by path and ID Underlying storage addressed by ID, but path traversal is in any case needed for ACL checks Relevant caches: Path to ID map (internal structure, not configurable) Item state caches (automatically balanced, configurable for special cases) Bundle cache (default fairly low, increase for large deployments) Also some PM-specific options (TarPM index, etc.) Caches optimized for a reasonably sized active working set typical web access pattern: handful of key resources and a long tail of less frequently accessed content, few writes Performance hit especially when updating nodes with lots of child nodes FineGrainedISMLocking for concurrent, non-overlapping writes 10
  • 11. Example: Bundle cache configuration 11 <!-- In …/repository/worspaces/${wsp.name}/workspace.xml --> <Workspace …> <PersistenceManager class=“…"> <paramname="bundleCacheSize" value="8"/> </PersistenceManager> </Workspace>
  • 12. Batch processing Two issues: read and write Reading lots of content Tree traversal the best approach, but will flood caches Schedule for off-peak times Add explicit delay (used by the garbage collectors) Use a dedicated cluster node for batch processing Writing lots of content (including deleting large subtrees) The entire transient space is kept in memory and committed atomically Split the operation to smaller pieces Save after every ~1k nodes Leverage the data store if possible 12
  • 13. Clustering Good for horizontally scaling reads Practically zero overhead on read access Not so good for heavy concurrent writes Exclusive lock over the whole cluster Direct all writes to a single master node Leverage the data store Note the cluster sync interval for query consistency, etc. Session.refresh() can be used to force a cluster sync 13
  • 14. Query performance What’s really fast? Constraints on properties, node types, full text Typically O(n) where n is the number of results, vs. the total number of nodes What’s pretty fast? Path constraints What needs some planning? Constraints on the child axis Sorting, limit/offset Joins What’s not yet available? Aggregate queries (COUNT, SUM, DISTINCT, etc.) Faceting 14
  • 15. Join engine 15 SELECT a.* FROM [nt:unstructured] AS a JOIN [nt:unstructured] AS b <PersistenceManager class=“…"> <paramname="bundleCacheSize" value="8"/> </PersistenceManager> </Workspace>
  • 16. Indexing configuration Default configuration Index all non-binary properties Index binary jcr:data properties (think nt:file/nt:resource) Full text extraction support for all major document formats Full text extraction from images, packages, etc. is explicitly disabled CQ5 / WEM comes with default aggregate indexing rules for cq:Pages, etc. Why change the configuration? Reduce the index size (by default almost as large as the PM) Enable features like aggregate indexes Assign boost values for selected properties to improve search result relevance 16
  • 17. Indexing configuration How to change the configuration? indexing_configuration.xml file in the workspace directory Referenced by the indexingConfiguration option in the workspace.xml file See http://wiki.apache.org/jackrabbit/IndexingConfiguration Example: 17 <?xml version="1.0"?><!DOCTYPE configuration SYSTEM "http://jackrabbit.apache.org/dtd/indexing-configuration-1.0.dtd"><configuration xmlns:jcr="http://www.jcp.org/jcr/1.0" xmlns:nt="http://www.jcp.org/jcr/nt/1.0"> <aggregateprimaryType="nt:file"> <include>jcr:content</include> </aggregate> </configuration>